This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 


BEST AVAILABLE IMAGES 


Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 


IMAGES ARE BEST AVAILABLE COPY. 


As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 


PATENT 

ATTORNEY DOCKET NO. 50287/007002 


Certificate of Mailing: Date of Deposit 


February 23. 2004 


I hereby certify under 37 C.F.R. § 1.8(a) that this correspondence is being deposited with the United States Postal 
Service as first class mail with sufficient postage on the date indicated above and is addressed to the 
Commissioner for Patents, P.O. Box 1450, Alexandria, VA 223 13-1450. -s 

Rosemarie Perullo J ^f^j^x^Ty*^^ -^^/ A^'^^? 

Printed name of person mailing correspondence ^fjjfnature of person mailing correspondence ~ 


IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 


Applicant: 
Serial No.: 
Filed: 
Title: 


Sakari Kauppinen et al. 
10/690,487 
October 21, 2003 


Art Unit: 
Examiner: 
Customer No.: 


Not yet assigned 
Not yet assigned 
21559 


Oligonucleotides Useful for Detecting and Analyzing Nucleic Acids of Interest 


Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 


SUBMISSION OF PRIORITY DOCUMENT 

In connection with the above-referenced application, Applicants submit herewith 
a certified copy of Danish Patent Application PA 2003-00752, as filed on May 19, 2003, with the 
Danish Patent Office. Applicants petition for any necessary extensions of time for submission of 
this document. In addition, if there are any charges or any credits, please apply them to Deposit 
Account No. 03-2095. 

Respectfully submitted, 


Date: 



rady, Ph.D., P.C 


Clark & Elbing LLP 
T01 Federal Street 
Boston, MA 021 10 
Telephone: 617-428-0200 
Facsimile: 617-428-7045 


F:\50287\50287.0O7002 Submiss. of Priority Doc.doc 



Kongeriget Danmark 


Patent application No.: PA 2003 00752 

Date of filing: 1 9 May 2003 

Applicant: Exiqon A/S 

(Name and address) Bygstubben 9 

DK-2950 Vedbaek 

Denmark 

Title: Oligonucleotides useful for detecting and analyzing nucleic acids of 
interest 

IPC: - 

This is to certify that the attached documents are exact copies of the 
above mentioned patent application as originally filed. 



Patent- og 

Var em aerk estyrelsen 

l 1 9 MAJ 2003 

Oligonucleotides Useful for Detecting and Analyzing Nucleic Acids of Interest Modtaget 

The invention relates to nucleic acids and methods for expression profiling of mRNAs, identi- 
fying and profiling of particular mRNA splice variants, and detecting mutations, deletions, or 
5 duplications of particular exons, e.g., alterations associated with a disease such as cancer, in a 
nucleic acid sample, e.g., a patient sample. 
Background of the Invention 

The field of the invention is oligonucleotides (e.g., oligonucleotide anrays) that are 
useful for detecting nucleic acids of interest and for detecting differences between nucleic acid 

10 samples (e.g, such as samples from a cancer patient and a healthy patient). 

DNA chip technology utilizes minituarized arrays of DNA molecules immobilized on 
solid surfaces for biochemical analyses. The power of DNA microairays as experimental tools 
relies on the specific molecular recognition via complementary base-pairing, which makes them 
highly useful for massive parallel analyses. In the post-genomic era, microarray technology has 

15 thus become the method of choice for many hybridization-based assays, such as expression 
profiling, SNP detection, DNA re-sequencing, and genotyping on a genomic scale. 

Expression microarrays are capable of profiling gene expression patterns of tens of 
thousands of genes in a single experiment. Hence, this technology provides a powerful tool for 
deciphering complex biological systems, and thereby greatly facilitates research in basic 

20 biology and living processes, as well as disease diagnostics, theranostics, and drug 
development. In a typical cell, the mRNAs are distributed in three frequency classes: (i) 
superprevalent (10-20% of the total mRNA mass); (ii) intermediate (40-45%); and (iii) low- 
abundant mRNAs (40-45%). It is therefore of utmost importance that the dynamic range and 
sensitivity of the expression arrays are optimal, especially when analyzing expression levels of 

25 messages or mRNA splice variants belonging to the low-abundant class. 

The recent explosion of interest in DNA microarray technology has been sparked by 
two key innovations. The first was the use of non-porous solid support, such as glass or 
polymer as opposed to nylon or nitrocellulose filters, which has facilitated miniaturization and 
fluorescence-based detection. Roughly 20,000 cDNAs can be robotically spotted onto a 

30 microscope slide and hybridized with a double-labeled probe. The second was the development 
of methods for high-density spatial synthesis of oligonucleotides. The two key array 
technologies are outlined in the following. 
Oligonucleotide arrays 

An efficient strategy for oligonucleotide microarray manufacturing involves DNA 

35 synthesis on solid surfaces using combinatorial chemistry. Most of the current technology is 
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developed by Affymetrix and Rosetta Inpharmatics. Glass is currently preferred as the 
synthesis support because of its inert chemical properties and low level of intrinsic fluorescence 
as well as the ability to chemically derivatize the surface. Of the three approaches currently 
used to manufacture oligonucleotide arrays, the light-directed deprotection method is the most 

5 effective one in generating high density microchips. A single round of synthesis involves light- 
directed deprotection, followed by nucleotide coupling. Photolithographic masking is used to 
control the regions of the chip designated for illumination. Affymetrix uses a combination of 
photolithography and combinatorial chemistry to manufacture its GeneChip Arrays. Using 
technologies adapted from the semiconductor industry, GeneChip manufacturing begins with a 

10 5-inch square quartz wafer. Initially the quartz is washed to ensure uniform hydroxylation 
across its surface. The wafer is placed in a bath of silane, which reacts with the hydroxyl groups 
of the quartz and forms a matrix of covalently linked molecules. The distance between these 
silane molecules determines the probes' packing density, allowing arrays to hold over 500,000 
features within 1.28 square centimeters. The principal disadvantage of this method is that a 

15 significant amount of chip design work and cost is associated with the mask design. Once a set 
of masks has been made, a large number of chips can be produced at a reasonable cost. The 
current pricing of oligonucleotide arrays available from Affymetrix are in the range of 5-10 fold 
more expensive than cDNA microarrays. 

DNA-DNA hybridization using oligonucleotide chips is clearly different from that of 

20 cDNA microarrays. Hybridizations involving oligos are much more sensitive to the GC content 
of individual heterodupiexes. In addition, single base mismatches have a pronounced effect on 
the hybridization reassociation of short oligos, and point mutations can thus be readily detected 
using oligo chips. 
cDNA microarrays 

25 cDNA microarrays containing large DNA segments such as cDNAs are generated by 

physically depositing small amounts of each DNA of interest onto known locations on glass 
surfaces. Two technologies for printing microarrays are (1) mechanical microspotting, and (2) 
ink-jetting. Mechanical microspotting has been extensively used at, e.g., Stanford University, 
and it utilizes pins or capillaries to deposit small quantities of DNA onto known addresses using 

30 motion control systems. Recent advances in microspotting technology using modern arraying 
robots allow for the preparation of 100 microarrays containing over 10,000 features in less than 
12 hours. A DNA arrayer is relatively easy to set up, and the cost is usually low compared to 
on-chip oligoarrayers. cDNA microarrays are capable of profiling gene expression patterns of 
tens of thousands of genes in a single experiment. To compare the relative abundance of the 

35 arrayed gene sequences in two DNA or RNA samples, e.g., the total mRNA isolated from two 
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different cell populations, the two samples are first labeled using two different fluorescent dyes 
such as Cy-3 and Cy-5. The labeled samples are mixed and hybridized to the clones on the 
array slide. After the hybridization, laser excitation of the incorporated, fluorescent target 
molecules yields an emission with a characteristic spectra, which is measured with a confocal 
5 laser scanner. The monochrome images from the scanner are imported to the software in which 
the images are pseudo-colored and merged. Data from a single hybridization is viewed as a 
normalized ratio in which significant deviations from the ratio are indicative of either increased 
or decreased expression levels relative to the reference sample. Data from multiple experiments 
can be examined using any number of data mining tools. 

10 Current status of array technology 

It has now become clear that cDNA microarrays, originally developed by Pat Brown 
and co-workers at the Stanford University, are sensitive, but may not be sufficiently specific 
with respect to, e.g., discrimination of homologous transcripts in gene families and alternatively 
spliced isoforms. On the other hand, the Affymetrix GeneChip system is specific, but may not 

15 be sensitive enough. This lack of sensitivity may explain why Affymetrix uses 16x 26-mer 
perfect match capture probes together with 16x25-mer mismatch probes per transcript in its 
expression profiling chips resulting in enormous data sets in genome-wide arrays. Therefore, 
the functional genomics field is in the process of switching, as they run out of samples, from 
existing PCR-amplified cDNA fragment libraries for microarraying to custom longmer 

20 oligonucleotide arrays comprising transcript-specific oligonucleotide capture probes typically in 
the range of 30-mers to 80-mers, thus addressing both specificity and sensitivity. 
Alternative splicing 

As the field of genomics research is shifting from the acquisition of genome sequences 
to high-throughput functional genomics, there is an increasing need to understand the dynamics 

25 within the genetic regulation as well as RNA and protein sequences in order to elucidate gene 
expression in all its complexity. A common feature for eukaryotic genes is that they are 
composed of protein-encoding exons and introns. Introns (intra-genic-regions) are non-coding 
DNA which interrupt the exons. Introns are characterized by being excised from the pre- 
mRNA molecule in RNA splicing, as the sequences on each side of the intron are spliced 

30 together. RNA splicing not only provides functional mRNA, but is also responsible for 
generating additional diversity. This phenomenon is called alternative splicing, which results in 
the production of different mRNAs from the same gene. The mRNAs that represent isoforms 
arising from a single gene can differ by the use of alternative exons or retention of an intron that 
disrupts two exons. This process often leads to different protein products that may have related 

35 or drastically different, even antagonistic, cellular functions. There is increasing evidence 
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indicating that alternative splicing is very widespread (Croft et al. Nature Genetics, 2000), 
Recent studies have revealed that at least 60% of the roughly 35,000 genes in the human 
genome are alternatively spliced. Clearly, by combining different types of modifications and 
thus generating different possible combinations of transcripts of different genes, alternative 
5 splicing is a potent mechanism for generating protein diversity. Analysis of the spliceome, in 
turn, represents a novel approach to both functional genomics and phaimacogenomics. 
Antisense transcription in eukaryotes 

RNA-mediated gene regulation is widespread in higher eukaryotes and complex genetic 
phenomena like RNA interference, co-suppression, transgene silencing, imprinting, 

10 methylation, and possibly position-effect variegation and transvection, all involve intersecting 
pathways based on or connected to RNA signalling (Mattick 2001; EMBO reports 2, 1 1: 986- 
991). Recent studies indicate that antisense transcription is a very common phenomenon in the 
mouse and human genomes (Okazaki et al. 2002; Nature 420: 563-573; Yelin et al. 2003, 
Nature Biotechnol.). Thus, antisense modulation of gene expression in e.g. human cells may be 

15 a common regulatory mechanism. In light of this, the present invention provides novel tools, in 
which non-naturally occurring nucleic acids, such as LNA oligonucleotides, can be designed to 
silence or modulate the regulation of a given mRNA by non-coding antisense RNA, by 
designing a complementary sense LNA oligonucleotide for the regulatory antisense RNA. This 
has a high potential in target identification, target validation and therapeutic use of LNA 

20 oligonucleotides as modulating and silencing sense nucleic acid agents. 
Misplaced control of alternative splicing can cause disease 

The detection of the detailed structure of all transcripts is an important goal for 
molecular characterization of a cell or tissue. Without the ability to detect and quantify the 
splice variants present in one tissue, the transcript content or the protein content cannot be 

25 described accurately. Molecular medical research shows that many cancers result in altered 
levels of splice variants, so an accurate method to detect and quantify these transcripts is 
required. Mutations that produce an aberrant splice form can also be the primary cause of such 
severe diseases such as spinal muscular dystrophy and cystic fibrosis. 

Much of the study of human disease, indeed much of genetics is based upon the study of 

30 a few model organisms. The evolutionary stability of alternative splicing patterns and the 
degree to which splicing changes according to mutations and environmental and cellular 
conditions influence the relevance of these model systems. At present, there is little 
understanding of the rates at which alternative splicing patterns change, and the factors 
influencing these rates. Table 1 shows a set of genes that are known to be alternatively spliced 

35 and that are orthologs of known human disease genes. 
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Table 1. C. elegans disease orthologs that are known to be differentially spliced in C. elegans. 


Disease 

C. elegans gene 

BLAST E value 

brABLl 

M79.1A 

1.00E-162 

X-Litiked LymphoproL-SH2D 1 A 

M79.1A 

2.00E-58 

Cyclin Dep. Kinase 4-CDK4 

F18H3.5A 

1.00E-124 

HNPCC*-PMS2 

H12C20.2A 

1.00E-123 

Neurofibromatosis 2-NF2 

C01G8.5A 

5.00E-163 

Duchenne MD+-DMD 

F32B4.3A 

0.00E+O0 

Coffm-Lowry-RPS6KA3 

T01H8.1A 

2.00E-13 

Septooptic Dysplasia-HESXl 

Y113G7A.6A 

1.00E-152 

iNon-insuiin ucp. uiaoei.-r L-oivi 

PI 1 A A 1 A 

1 oof \fkfk 

Bartter's-SLC12A1 

Y37A1C.1A 

1.00E-167 

Gitelmans-SLC 1 2 A3 

Y37A1C.1A 

0.00E+O0 

Hered. Spherocytosis-ANKl 

B0350.2A 

1.00E-09 

Darier-White-SERCA 

K11D9.2A 

0.00E+O0 

Spondyloepip.Dysp .-COL2A1 

F01G12.5A/let-2 

9.00E-20 


Previously, other microarray analyses have been performed with the aim of detecting either 
5 splicing of RNA transcripts per se in yeast, or of detecting putative exon skipping splicing 
events in rat tissues, but neither of these approaches had sufficient resolution to estimate 
quantities of splice variants, a factor that could be essential to an understanding of the changes 
in cell life cycle and disease. 

Thus, improved methods are needed for nucleic acid amplification, hybridization, and 
10 classification. Desirable methods can distinguish between mRNA splice variants and quantitate 
the amount of each variant in a sample. Other desirable methods can detect differences in 
expressions patterns between patient nucleic acid samples and nucleic acid standards. 

Summary of the Invention 
15 The present invention demonstrates the usefulness of LNA-modified oligonucleotides in 

the construction of highly specific and sensitive microarrays for expression profiling (e.g., 
mRNA splice variant detection) and comparative genomic hybridization. The invention 
provides novel technology platforms based on nucleic acids with LNA or other high affinity 
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nucleotides for sensitive and specific assessment of alternative splicing using microarray 
technology. As opposed to high-density cDNA or DNA oligonucleotide microarrays, LNA 
microarrays are able to discriminate between highly homologous as well as differentially 
spliced transcripts. The present methods greatly facilitate the analysis of gene expression 
5 patterns from a particular species, tissue, cell type. The analysis of the human spliceome 
provides important information for pharmacogenetics. Thus, the present methods are highly 
valuable in medical research and diagnostics as well as in drug development and toxicological 
studies. 

In general, the invention features populations of high affinity nucleic acids that have 

10 duplex stabilizing properties and thus are useful for a variety of nucleic acid detection, 
amplification, and hybridization methods (e.g., expression or mRNA splice variant profiling). 
Some of these oligonucleotides contain novel nucleotides created by combining specialized 
synthetic nucleobases with an LNA backbone, thus creating high affinity oligonucleotides with 
specialized properties such as reduced sequence discrimination for the complementary strand or 

15 reduced ability to form intramolecular double stranded structures. The invention also provides 
improved methods for identifying nucleic acids in a sample and for classifying a nucleic acid 
sample by comparing its pattern of hybridization to an array to the corresponding pattern of 
hybridization of one or more standards to the array (e.g., comparative genomic hybridization). 

Other desirable modified bases have decreased ability to self-anneal or to form duplexes 

20 with oligonucleotides containing one or more modified bases. The invention also provides 
arrays of nucleic acids containing these modified bases that have a decreased variance in 
melting temperature and/or an increased capture efficiency compared to naturally-occurring 
nucleic acids. These arrays as well as the oligonucleotides in solution can be used in a variety 
of applications for the detection, characterization, identification, and/or amplification of one or 

25 more target nucleic acids. These oligonucleotides and oligonucleotides of the invention in 
general can also be used for solution assays, such as homogeneous assays. 
Merged Probes 

In one aspect, the invention features a non-naturally-occurring nucleic acid with 
a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C higher 
30 than that of the corresponding control nucleic acid with 2'-deoxynucleotides. The 
nucleic acid hybridizes to a first region within a first exon of a target nucleic acid and to 
a second region within a second exon of the target nucleic acid that is adjacent to the 
first exon. 

In a related aspect, the invention provides a non-naturally-occurring nucleic acid 
35 with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C 
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higher than that of the corresponding control nucleic acid with 2*-deoxynucleotides. 
The nucleic acid hybridizes to a first region within an exon of a target nucleic acid and 
to a second region within an intron of the target nucleic acid that is adjacent to the exon. 
In another aspect, the invention features a non-naturally-occurring nucleic acid 
5 with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C 
higher than that of the corresponding control nucleic acid with 2 f -deoxynucleotides. 
The nucleic acid hybridizes to a first region within a first intron of a target nucleic acid 
and to a second region within a second intron of the target nucleic acid that is adjacent 
to the first intron. 

10 In yet another aspect, the invention provides a nucleic acid that is a non- 

naturally-occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 
150, 200, 500, 800, 1000, or 12000% greater than that of a corresponding control 
nucleic acid with 2-deoxynucleotides at the temperature equal to the melting 
temperature of the nucleic acid. The nucleic acid hybridizes to a first region within a 
IS first exon of a target nucleic acid and to a second region within a second exon of the 
target nucleic acid that is adjacent to the first exon. 

In a related aspect, the invention features a nucleic acid that is a non-naturally- 
occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 150, 
200, 500, 800, 1000, or 12000% greater than that of a corresponding control nucleic 
20 acid with 2-deoxynucleotides at the temperature equal to the melting temperature of the 
nucleic acid. The nucleic acid hybridizes to a first region within an exon of a target 
nucleic acid and to a second region within an intron of the target nucleic acid that is 
adjacent to the exon. 

In yet another aspect, the invention provides a nucleic acid that is a non- 
25 naturally-occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 
150, 200, 500, 800, 1000, or 12000% greater than that of a corresponding control 
nucleic acid with 2'-deoxynucleotides at the temperature equal to the melting 
temperature of the nucleic acid. The nucleic acid hybridizes to a first region within a 
first intron of a target nucleic acid and to a second region within a second intron of the 
30 target nucleic acid that is adjacent to the first intron. 

In desirable embodiments, the nucleic acids of the invention featuring a non- 
naturally occurring nucleic acid exhibit increased duplex stability due to slower rates of 
dissociation of the nucleic acid complexes (the off-rate) (Christensen et al. 2001, 
Biochem. J. 354: 481-484). 
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In still another aspect, the invention features a nucleic acid that is an LNA (i.e., 
a nucleic acids with one or more LNA units) and that hybridizes to a first region within 
a first exon of a target nucleic acid and to a second region within a second exon of the 
target nucleic acid that is adjacent to the first exon. 
5 In another aspect, the invention features a nucleic acid that is an LNA and that 

hybridizes to a first region within an exon of a target nucleic acid and to a second region 
within an intron of the target nucleic acid that is adjacent to the exon. 

In one aspect, the invention provides nucleic acid that is an LNA and that 
hybridizes to a first region within a first intron of a target nucleic acid and to a second 
10 region within a second intron of the target nucleic acid that is adjacent to the first intron. 

In desirable embodiments of any of the above aspects, the length of the segment 
of the nucleid acid hybridizing to the first region and the length of the segment of the 
nucleic acid hybridizing to the second region are between 3 and SO nucleotides, 10 and 
40 nucleotides, or 20 and 30 nucleotides, inclusive. The length of the segment of the 
15 nucleid acid hybridizing to the first region and the length of the segment of the nucleid 
acid hybridizing to the second region may be the same length or different lengths. 
Desirably, the nucleic acid contains LNA units are are symmetricaly spaced on both 
sides of a junction between either two exons, an exon and an intron, or two introns. 
Desirably, the nucleic acid has one or more LNA units within 5, 4, 3, 2, or 1 nucleotides 
20 of a junction between either two exons, an exon and an intron, or two introns. 

In another aspect, the invention features a population of nucleic acids that 
includes one or more nucleic acids of any one of the above aspects. 
Internal Probes 

In another aspect, the invention features a non-natural ly-occurring nucleic acid 
25 with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C 
higher than that of the corresponding control nucleic acid with 2-deoxynucleotides. 
The nucleic acid hybridizes to only one exon or to only one intron of a target nucleic 
acid. 

In a related aspect, the invention features a non-naturally-occurring nucleic acid 
30 with a capture efficiency that is at least 10, 25, 50, 100, 150, 200, 500, 800, 1000, or 
12000% greater than that of a corresponding control nucleic acid with 2- 
deoxynucleotides at the temperature equal to the melting temperature of the nucleic 
acid. The nucleic acid hybridizes to only one exon or to only one intron of a target 
nucleic acid. 
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In another aspect, the invention features a nucleic acid that is an LNA and that 
hybridizes to only one exon or to only one intron of a target nucleic acid. 

In desirable embodiments of the above aspects for nucleic acids that hybridizes 
to only one exon or only one intron, the nucleic acid does not hybridize to both an exon 
5 and an intron. 

In another aspect, the invention features a population of nucleic acids that 
includes one or more nucleic acids of any one of the above aspects. 
Pharmaceutical Compositions and Nucleic Acid Populations 

In another aspect, the invention features a pharmaceutical composition that includes one 

10 or more of the nucleic acids of the invention and a pharmaceutical^ acceptable carrier, such as 
one of the carriers described herein. 

In another aspect, the invention features a population of two or more nucleic acids of the 
invention. The populations of nucleic acids of the invention may contain any number of unique 
molecules. For example, the population may contain as few as 10, 10 2 , 10 4 , or 10 5 unique 

15 molecules or as many as 10 7 , 10 8 , 10 9 or more unique molecules. In desirable embodiments, at 
least I, 5, 10, 50, 100 or more of the polynucleotide sequences are a non-naturally-occurring 
sequence. Desirably, at least 20, 40, or 60% of the unique polynucleotide sequences are non- 
naturally-occurring sequences. Desirably, the nucleic acids are all the same length; however, 
some of the molecules may differ in length. 

20 Desirable Embodiments of Any of the Above Aspects 

In desirable embodiments of any of the above aspects, the length of one or more 
nucleic acids (e.g., nucleic acids in a nucleic acid population of the invention) is 
between 15 and 150 nucleotides, 5 and 100 nucleotides, 20 and 80 nucleotides, or 30 
and 60 nucleotides in length, inclusive. In particular embodiments, the nucleic acid is 6, 

25 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 t 18, 19, 20, 30, 40, 50 nucleotides or at least 60, 70, 
80, 90, 100, 120, or 130 nucleotides in length. In additional embodiments, the nucleic 
acid is between 8 and 40 nucleotides, such as between 9 and 30, or 12 and 25, or 15 and 
20 nucleotides. Desirably, at least 5, 10, 15, 20, 30, 40, 50, 60, or 70% of the 
nucleotides in the nucleic acid are LNA units. In desirable embodiments, every second 

30 nucleotide, every third, every fourth nucleotide, every fifth nucleotide, or every sixth 
nucleotide in the nucleic acid is an LNA unit. In various embodiments, (i) every second 
and every third nucleotide, (ii) every second and every fourth nucleotide, (iii) every 
second and every fifth nucleotide, (iv) every second and every sixth nucleotide, (v) 
every third and every fourth nucleotide, (vi) every third and every fifth nucleotide, (vii) 

35 every third and every sixth nucleotide, (viii) every fourth and every fifth nucleotide, (ix) 
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every fourth and every sixth nucleotide, and/or (x) every fifth and every sixth nucleotide 
in the nucleic acid is an LNA unit. Desirably, every second, every third, and every 
fourth nucleotide in the nucleic acid is an LNA unit. In desirable embodiments, the 
nucleic acids of the invention have one or more of the following substitution patterns 
5 which is repeated throughout the nucleic acids: XxXx, xXxX, XxxXxx, xXxxXx, 
xxXxxX, XxxxXxxx, xXxxxXxx, xxXxxxXx, or xxxXxxxX in which "X" denotes an 
LNA unit and "x" denotes a DNA or RNA unit. In some embodiments, the nucleotides 
that are not LNA units are naturally-occuring DNA or RNA nucleotides. 

In various embodiments, the nucleic acid comprises two or more contiguous 

10 LNA units. Desirably, the nucleic acid comprises at least 2, 3, 4, 5, 6, 7, or 8 contiguous 
LNA units. In desirable embodiments, the number of contiguous LNA units is between 
5 and 20% or 10 and 15% of the total length of the nucleic acid. In a particular 
embodiment, 5 contiguous nucleotides of a 50-mer merged probe are LNA units. In one 
embodiment, the nucleic acid does not have greatly extended stretches of modified 

15 DNA or RNA residues, e.g. greater than about 4, 5, 6, 7, or 8 consecutive modified 
DNA or RNA residues. According to this embodiment, one or more non-modified 
DNA or RNA units are present after a consecutive stretch of about 3, 4, 5, 6, 7, or 8 
modified nucleic acids. 

Other desirable nucleic acids have an LNA substitution pattern that results in the 

20 formation of negligible secondary structure by the nucleic acids with itself. In one such 
embodiment, the nucleic acids do not form hairpins or do not form other secondary 
structures that would otherwise inhibit or prevent their binding to a target nucleic acid. 
Desirably, opposing nucleotides in a palindrome pair or opposing nucleotides in 
inverted repeats or in reverse complements are not both LNA units. In various 

25 embodiments, the nucleic acids in the first population form less than 3, 2, or 1 
intramolecular base-pairs or base-pairs between two identical molecules. In desirable 
embodiments, the nucleic acid does not have LNA-5-nitroindole: LNA-5-nitroindoIe 
intramolecular base-pairs. 

In other desirable embodiments, at least one LNA unit (e.g., at least 2 f 3, 4, 5, 6, 

30 7, 8, 9, or 10 LNA units) in the nucleic acid hybridizes to a first region within a first 
exon of a target nucleic acid and at least one LNA unit (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 
or 10 LNA units) in the nucleic acid hybridizes to a second region within a second exon 
of the target nucleic acid that is adjacent to the first exon. The number of LNA units 
that bind to each region can be the same or different. In some embodiments, the 5* 

35 terminal nucleotide of the nucleic acid is or is not an LNA unit. Desirably, the 3' 
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terminal nucleotide of the nucleic acid is not an LNA unit (e.g., the nucleic acid may 
contain a 3* terminal naturally-occurring nucleotide). 

Desirably, the nucleic acid can distinguish between different nucleic acids (e.g., mRNA 
splice variants) that cannot be distinguished using a naturally-occurring control nucleic acid 
5 (e.g., a control nucleic acid that consists of only 2 r -deoxynucleotides such as a control nucleic 
acid of the same length as the nucleic acid of the invention). Desirably, the hybridization 
intensity of the nucleic acid for an exon of interest is at least 2, 3, 4, 5, 6, or 10 fold greater than 
the hybridization intensity of the nucleic acid for another exon in the same target nucleic acid 
(e.g., mRNA) or in another nucleic acid. Desirably, the hybridization intensity of the nucleic 

10 acid for target nucleic acid is at least 2, 3, 4, 5, 6, or 10 fold greater than the hybridization 
intensity for a non-target nucleic acid with less than 99, 95, 90, 80, 70, or 60% sequence 
identity to the target nucleic acid. 

Desirably, all of the nucleic acids of the population or all of the nucleic acids of 
a subpopulation of the population are the same length. In some embodiments, the 

15 population includes one or more nucleic acids of a different length. In some 
embodiments, longer nucleic acids contain one or more nucleotides with universal 
bases. For example, nucleotides with universal bases can be used to increase the 
thermal stability of nucleic acids that would otherwise have a thermal stability lower 
than some or all of the nucleic acids in the population. In some embodiments, one or 

20 more nucleic acids have a universal base located at the 5' or 3' terminus of the nucleic 
acid. In desirable embodiments, one or more (e.g., 2, 3, 4, 5, 6, or more) universal bases 
are located at the 5' and 3' termini of the nucleic acid. Desirably, all of the nucleic acids 
in the population have the same number of universal bases. Desirable universal bases 
include inosine, pyrene, 3-nitropyrrole, and 5-nitroindole. 

25 In desirable embodiments, the nucleic acid has at least one LNA A or LNA T. 

In some embodiments, each nucleic acid has at least one LNA A or LNA T. Desirably, 
all of the adenine and thymine-containing nucleotides in the LNA are LNA A and LNA 
T, respectively. In some embodiments, a nucleic acid with an increased capture 
efficiency or melting temperature compared to a control nucleic acid has at least one 

30 LNA T or LNA C. In some embodiments, all of the thymidine and cytosine-containing 
nucleotides in the LNA are LNA T and LNA C, respectively. In some embodiments, a 
nucleic acid with an increased specificity or decreased self-complementarity compared 
to a control nucleic acid has at least one LNA A or LNA C. In some embodiments, all 
of the adenine and cytosine-containing nucleotides in the LNA are LNA A and LNA C, 
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respectively. Desirably, at least 10, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100% of the 

nucleic acids in the population have one ore more LNA units. 

In desirable embodiments, the LNA has at least one 2,6,-diaminopurine, 2- 

aminopurine, 2-thio-thymine, 2-thio-uracil, inosine, or hypoxanthine base. Desirably, 
5 the LNA has a nucleotide with a 2 f O, 4'C-methylene linkage between the 2' and 4* 

position of a sugar moiety. In some embodiments, one or more nucleic acis in the first 

population are LNA/DNA, LNA/RNA, or LNA/DNA/RNA chimeras. 

In desirable embodiments of any of the above aspects, the variance in the 

melting temperature of the population is at least 10, 20, 30, 40, 50, 60, or 70% less than 
10 the variance in the melting temperature of the corresponding control population of 

nucleic acids of the same length with 2'-deoxynucleotides (e.g., DNA nucleotides) 

instead of LNA units or other modified or non-naturally-occurring units. In desirable 

embodiments, the standard deviation in melting temperature is less than 10, 9.5, 9, 8.5, 

8, 7.5, 7, 6.5, or 6. In certain embodiment, the range in melting temperatures for nucleic 
15 acids in the population is less than 70, 60, 50, 40, 30, or 20°C. Desirably, the variance 

in the melting temperature of the population is less than 59, 50, 40, 30, 25, 20, 15, 10, or 

5. 

In still other embodiments, the nucleic acids are covalently bonded to a solid 
support. Desirably, the nucleic acids are in a predefined arrangement. In various 

20 embodiments, the first population has at least 10; 100; 1,000; 5,000; 10,000; 100,000; or 
1,000,000 different nucleic acids. Desirably, the nucleic acids in the population 
together hybridize to at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the exons 
of a target nucleic acid. In desirable embodiments, the population includes nucleic acids 
that together hybridize to at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the 

25 nucleic acids expressed by a particular cell or tissue. In some embodiments, the 
population includes nucleic acids that together hybridize to at least one exon from at 
least 1, 5, 10, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100% of the nucleic acid sequences 
expressed by a particular cell or tissue at a given point in time (e.g., an expression array 
with sequences corresponding to the sequences of mRNA molecules expressed by a 

30 particular cell type or a cell under a particular set of conditions). In some embodiments, 
the plurality of nucleic acids are used as PCR primers or FISH probes. 

Desirable modified bases of the present invention when incorporated into the central 
position of a 9-mer oligonucleotide (all other eight residues or units being natural DNA or RNA 
units with natural bases) exhibit a T m difference equal to or less than about 15, 12, 10, 9, 8, 7, 6, 

35 5, 4, 3 or 2°C upon hybridizing to the four complementary oligonucleotide variants that are 
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identical except for the unit corresponding to the LNA unit, where each variant has one of the 
natural bases uracil, cytosine, thymine, adenine or guanine. That is, the highest and the lowest 
T m (referred to herein as the T m differential) obtained with such four complementary sequences 
is 15, 12, 10, 9, 8, 7, 6, 5, 4, 3 or 2°C or less, 
5 Modified nucleic acid oligomers of the invention desirably contain at least one LNA 

unit, such as an LNA unit with a modified nucleobase. Modified nucleobases or nucleosidic 
bases desirably base-pair with adenine, guanine, cytosine, uracil, or thymine. Exemplary 
oligomers contain 2 to 100, 5 to 100, 4 to 50, 5 to 50, 5 to 30, or 8 to 15 nucleic acid units. In 
some embodiments, one or more LNA units with natural nucleobases are incorporated into the 
10 oligonucleotide at a distance from the LNA unit having a modified base of 1 to 6 (e.g., 1 to 4) 
bases. In certain embodiments, at least two LNA units with natural nucleobases are flanking an 
LNA unit having a modified base. Desirably, at least two LNA units independently are 
positioned at a distance from the LNA unit having the modified base of 1 to 6 (e.g., 1 to 4 
bases). 

15 Desirable modified nucleobases or nucleosidic bases for use in nucleic acid 

compositions of the invention include optionally substituted carbon alicyclic or carbocyclic aryl 
groups (i.e., only carbon ring members), particularly multi-ring carbocyclic aryl groups such as 
groups having 2, 3, 4, 5, 6, 7, or 8 linked, particularly fused carbocyclic aryl moieties. 
Optionally substituted pyrene is also desirable. Such nucleobases or nucleosidic bases can 

20 provide significant performance results, as demonstrated in the examples which follow. 
Heteroalicyclic and heteroaromatic nucleobases or nucleosidic bases also are suitable. In some 
embodiments, the carbocyclic moiety is linked to the Imposition of the LNA unit through a 
linker (e.g., a branched or straight alkylene or alkenylene). 

Desirable LNA units have a carbon or hetero alicyclic ring with four to six ring 

25 members, e.g. a furanose ring, or other alicyclic ring structures such as a cyclopentyl, 
cycloheptyl, tetrahydropyranyl, oxepanyl, tetrahydrothiophenyl, pyrrolidinyl, thianyl, thiepanyl, 
piperidinyl, and the like. In one aspect, at least one ring atom of the carbon or hetero alicyclic 
group is taken to form a further cyclic linkage to thereby provide a multi-cyclic group. The 
cyclic linkage may include one or more, typically two atoms, of the carbon or hetero alicyclic 

30 group. The cyclic linkage also may include one or more atoms that are substituents, but not 
ring members, of the carbon or hetero alicyclic group. Other desirable LNA units are 
compounds having a substituent on the 2 '-position of the central sugar moiety (e.g., ribose or 
xylose), or derivatives thereof, which favors the C3'-endo conformation, commonly referred to 
as the North (or simply N for short) conformation. These LNA units include ENA (2'-0,4'-C- 
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ethylene-bridged nucleic acids such as those disclosed in WO 00/47599) units as well as non- 
bridged riboses such as 2*-F or 2'-0-methyl. 

For any of the above aspects, an exemplary control nucleic acid has P-D-2- 
deoxyribose instead of one or more bicyclic or sugar groups of a LNA unit or other 
5 modified or non-natural ly-occurring units in a nucleic acid of the first population. In 
some embodiments, the nucleic acid or population of the invention and the control 
nucleic acid or population only have naturally-occurring nucleobases. If a nucleic acid 
in the nucleic acid or population of the invention has one or more non-naturally- 
occurring nucleobases, the capture efficiency of the corresponding control nucleic acid 

10 is calculated as the average capture efficiency for alt of the nucleic acids that have either 
A, T, C, G or mC (methyl Cytosin) in each position corresponding to a non-naturally- 
occurring nucleobase in the nucleic acid in the first population. 
Complex of Target Nucleic Acids and Nucleic Acid Probes 

In one aspect, the invention features a complex of one or more target nucleic 

15 acids and nucleic acid of the invention (e.g., nucleic acid probes) in which one or more 
target nucleic acids are hybridized to a plurality of nucleic acids of the invention. 
Desirably, at least 2, 3, 4, 5, 6, 7, 10, 15, 20, 30, or 40 different target nucleic acids are 
hybridized. In some embodiments, the target nucleic acids arc cDNA molecules reverse 
transcribed from a patient sample or cRNA molecules amplified from a patient sample 

20 using a T7 RNA polymerase-based linear amplification system or the like. The target 
nucleic acids are labeled prior to hybridization to the nucleic acids of invention. 
Methods for Detecting or Amplifying Target Nucleic Acids 

In one aspect, the invention features a method for detecting the presence of one 
or more target nucleic acids in a sample. This method involves incubating a nucleic 

25 acid sample with one or more nucleic acids of the invention under conditions that allow 
at least one target nucleic acid to hybridize to at least one of the nucleic acids of the 
invention. Desirably, hybridization is detected for at least 2, 3, 4, 56, 8, 10, or 12 target 
nucleic acids. In some embodiments, the method further includes contacting the target 
nucleic acid with a second nucleic acid or a population of second nucleic acids that 

30 binds to a different region of the target molecule than the first nucleic acid. Desirably, 
the method further involves identifying one or more hybridized target nucleic acids and/ 
or determining the amount of one or more hybridized target nucleic acids. In desirable 
embodiments, the method further includes determining the presence or absence of an 
mRNA splice variant of interest in the sample and/or determining the presence or 

35 absence of a mutation, deletion, and/or duplication of an exon of interest. In some 
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embodiments, the mutation, deletion, and/or duplication is indicative of a disease, 
disorder, or condition, such as cancer. 

In desirable embodiments of any of the above detection methods, at least 5, 10, 
15, 20, 30, 40, 50, 80, 100, 150, 200, or more target nucleic acids hybridize to the 
5 nucleic acids of the invention. Desirably, the method is repeated under one or more 
different incubation conditions. In particular embodiments, the method is repeated at 1, 
3, 5, 8, 10, 15, 20, 30, 40 or more different temperatures, cation concentrations (e.g., 
concentrations of monovalent cations such as Na* and K + or divalent cations such as 
Mg 2+ and Ca 2+ ), denaturants (e.g., hydrogen bond donors or acceptors that interfere with 

10 the hydrogen bonds keeping the base-pairs together such as formamide or urea). 
Desirably, the method also includes identifying the target nucleic acid hybridized to the 
nucleic acids of the invention and/or determining the amount of the target nucleic acid 
hybridized to the nucleic acids of the invention. In particular embodiments, the target 
nucleic acids are labeled with a fluorescent group. In certain embodiments, the labeling 

15 is repeated using different fluorescent groups (e.g., labelling for so-called dye-swap 
labeling experiments). In desirable embodiments, the determination of the amount of 
bound target nucleic acid involves one or more of the following: (i) adjusting for the 
varying intensity of the excitation light source used for detection of the hybridization, 
(ii) adjusting for photobleaching of the fluorescent group, and/or (iii) comparing the 

20 fluorescent intensity of the target nucleic acid(s) hybridized to the nucleic acids of the 
invention of nucleic acids to the fluorescent intensity of a different sample of nucleic 
acids hybridized to the nucleic acids of the invention (e.g., a different sample hybridized 
to the same population of nucleic acids of the invention on the same or a different solid 
support such as the same chip or a different chip). Desirably, this comparison in 

25 fluorescent intensity involves adjusting for a difference in the amount of the nucleic 
acids of the invention used for hybridization to each sample and/or adjusting for a 
difference in the buffer (e.g., a difference in Mg 2+ concentration) used for hybridization 
to each sample or scaling for different labeling efficiencies with different 
fluorochromes. 

30 Desirably, the target nucleic acids are cDNA molecules reverse transcribed from 

a patient sample or cRNA molecules amplified using a T7 RNA polymerase-based 
linear amplification system or the like from a patient sample. In particular 
embodiments, the sample has nucleic acids that are amplified using one or more primers 
specific for an exon of a target nucleic acid, and the method involves determining the 

35 presence or absence of an mRNA splice variant with the exon in the sample. Desirably, 
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one or more of the primers are specific for an exon or exon-exon junction of a pathogen 
of interest, and the method involves determining the presence or absence of a nucleic 
acid with the exon in the sample. 

In a desirable embodiment, the nucleic acids of the invention are covalently bonded to a 
5 solid support by reaction of a nucleoside phosphoramidite with an activated solid support, and 
subsequent reaction of a nucleoside phosphoramide with an activated nucleotide or nucleic acid 
bound to the solid support. In some embodiments, the solid support or the growing nucleic acid 
bound to the solid support is activated by illumination, a photogenerated acid, or electric 
current. 

10 In another aspect, the invention features a method for amplifying a target nucleic acid 

molecule. The method involves (a) incubating a first nucleic acid of the invention with a target 
nucleic acid under conditions that allow the first nucleic acid to bind the target nucleic acid; and 
(b) extending the first nucleic acid with the target nucleic acid as a template. Desirably, the 
method further involves contacting the target nucleic acid with a second nucleic acid (e.g., a 

15 second nucleic acid of the invention) that binds to a different region of the target nucleic acid 
than the first nucleic acid. In various embodiments, the sequence of the target nucleic acid is 
known or unknown, 

In one aspect, the invention features a method of detecting a nucleic acid of a pathogen 
(e.g., a nucleic acid in a sample such as a blood or urine sample from a mammal). This method 

20 involves contacting a nucleic acid probe of the invention (e.g., a probe specific for an exon or a 
mRNA from a particular pathogen or family of pathogens) with a nucleic acid sample under 
conditions that allow the probe to hybridize to at least one nucleic acid in the sample. The 
probe is desirably at least 60, 70, 80, 90, 95, or 100% complementary to a nucleic acid of a 
pathogen (e.g., a bacteria, virus, or yeast such as any of the pathogens described herein). 

25 Hybridization between the probe and a nucleic acid in the sample is detected, indicating that the 
sample contains the corresponding nucleic acid from a pathogen. In some embodiments, the 
method is used to determine what strain of a pathogen has infected a mammal (e.g., a human) 
by determining whether a particular nucleic acid is present in the sample. In other 
embodiments, the probe has a universal base in a position corresponding to a nucleotide that 

30 varies among different strains of a pathogen, and thus the probe detects the presence of a 
nucleic acid from any of a multiple of pathogenic strains. 
Methods for Classifying Nucleic Acids Samples 

In one aspect, the invention features a method for classifying a test nucleic acid 
sample including target nucleic acids. This method involves (a) incubating a test 

35 nucleic acid sample with a one or more nucleic acids of the invention under conditions 
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that allow at least one of the nucleic acids in the test sample to hybridize to at least one 
nucleic acid of the invention, (b) detecting a hybridization pattern of the test nucleic 
acid sample, and (c) comparing the hybridization pattern to a hybridization pattern of a 
first nucleic acid standard, whereby the comparison indicates whether or not the test 
5 sample has the same classification as the first standard. Desirably, the method also 
includes comparing a hybridization pattern of the test nucleic acid sample to a 
hybridization pattern of a second standard. In various embodiments, a hybridization 
pattern of the test nucleic acid sample is compared to at least 3, 4, 5, 8, 10, 15, 20, 30, 
40, or more standards. 

10 Desirably, the method also includes identifying the hybridized target nucleic 

acid and/or determining the amount of hybridized target nucleic acid. In particular 
embodiments, the target nucleic acids are labeled with a fluorescent group. Desirably, 
the first nucleic acid standard is labeled with a different fluorescent group. The 
fluorescence of the target nucleic acids and the first nucleic acid standard can be 

15 detected simultaneously or sequentially. 

In desirable embodiments, the method further includes determining the presence or 
absence of an mRNA splice variant of interest in the sample and/or determining the 
presence or absence of a mutation, deletion, and/or duplication of an exon of interest. In 
some embodiments, the mutation, deletion, and/or duplication is indicative of a disease, 

20 disorder, or condition, such as cancer. 

In desirable embodiments, the determination of the amount of bound target 
nucleic acid involves one or more of the following: (i) adjusting for the varying intensity 
of the excitation light source used for detection of the hybridization, (ii) adjusting for 
photobleaching of the fluorescent group, and/or (iii) comparing the fluorescent intensity 

25 of the target nucleic acid(s) hybridized to the nucleic acids of the invention to the 
fluorescent intensity of a different sample of nucleic acids hybridized to the nucleic 
acids of the invention (e.g., a different sample hybridized to same set of nucleic acids of 
the invention on the same or a different solid support such as the same chip or a 
different chip). Desirably, this comparison in fluorescent intensity involves adjusting 

30 for a difference in the amount of the plurality used for hybridization to each sample 
and/or adjusting for a difference in the buffer (e.g., a difference in Mg" concentration) 
used for hybridization to each sample. 

Desirably, the nucleic acids in the population together hybridize to at least 10, 
20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the exons of a target nucleic acid. In 

35 desirable embodiments, the population includes nucleic acids that together hybridize to 
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at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the nucleic acids expressed by 
a particular cell or tissue. In some embodiments, the population includes nucleic acids 
that together hybridize to at least one exon from at least 1, 5, 10, 20, 25, 30, 40, 50, 60, 
70, 80, 90, or 100% of the nucleic acid sequences expressed by a particular cell or tissue 
5 at a given point in time (e.g., an expression array with sequences corresponding to the 
sequences of mRNA molecules expressed by a particular cell type or a cell under a 
particular set of conditions). Desirably, the method further includes using a nucleic acid 
or a region of a nucleic acid that is present in a first test sample but not present in a first 
standard or not present in a second test sample as a probe or primer for the detection, 

10 amplification, or characterization of the nucleic acid. 

In desirable embodiments of any of the above methods, at least 5, 10, 15, 20, 30, 
40, 50, 80, 100, 150, 200, or more target nucleic acids hybridize to the nucleic acids of 
the invention. Desirably, the method is repeated under one or more different incubation 
or hybridization conditions. In particular embodiments, the method is repeated at 1,3, 

15 5, 8, 10, 15, 20, 30, 40 or more different temperatures, cation concentrations (e.g., 
concentration of monovalent cations such as Na + and K + or divalent cations such as 
Mg 2+ and Ca 2 ),' denaturants (e.g., hydrogen bond donors or acceptors that interfere with 
the hydrogen bonds keeping the base-pairs together such as formamide or urea). 

In particular embodiments, the sample has nucleic acids that are amplified using 

20 one or more primers specific for an exon of a target nucleic acid, and the method 
involves determining the presence or absence of an mRNA splice variant with the exon 
in the sample. Desirably, one or more of the primers are specific for an exon or exon- 
exon junction of a pathogen of interest, and the method involves determining the 
presence or absence of a nucleic acid with the exon in the sample. 

25 Desirably, the comparison of the hybridization pattern of a patient nucleic acid 

sample to that of one or more standards is used to determine whether or not a patient has 
a particular disease, disorder, condition, or infection or an increased risk for a particular 
disease, disorder, condition, or infection. In some embodiments, the comparison is used 
to determine what pathogen has infected a patient and to select a therapeutic for the 

30 treatment of the patient. Desirably, the comparison is used to select a therapeutic for the 
treatment or prevention of a disease or disorder in the patient. In yet other 
embodiments, the comparison is used to include or exclude the patient from a group in a 
clinical trial. Desirably, the comparision is used to compare the expression of nucleic 
acids (e.g., mRNA splice forms associted with toxicity) in the presence and absence of a 

35 candidate compound (e.g., a lead compound for drug development). In other 
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embodiments, the comparision is used to determine differences in expression of nucleic 
acids (e.g., mRNA splice variants) under particular conditions (e.g., under different 
environmantal stress conditions) or at different developmental time points. In particular 
embodiments, the expression of one or more members from a particular enzyme class 
5 (e.g., protein kinase splice variants) is measured. 

In a desirable embodiment, the nucleic acids of the invention are covalently bonded to a 
solid support by reaction of a nucleoside phosphoramidite with an activated solid support, and 
subsequent reaction of a nucleoside phosphoramide with an activated nucleotide or nucleic acid 
bound to the solid support. In some embodiments, the solid support or the growing nucleic acid 
10 bound to the solid support is activated by illumination, a photogenerated acid, or electric 
current. 

The use of a variety of different monomers in the nucleic acids of the invention offers a 
means to "fine tune" the chemical, physical, biological, pharmacokinetic, and pharmacological 
properties of the nucleic acids thereby facilitating improvement in their safety and efficacy 

15 profiles when used as a therapeutic drug. 

Applications for the Nucleic Acids of the Invention 

In another aspect, the invention features the use of one ore more nucleic acids of 
the invention for the detection, amplification, or classification of a nucleic acid of 
interest or a population of nucleic acids of interest. 

20 In another aspect, the invention features the use of one or more nucleic acids of 

the invention for alternative mRNA splice variant detection, expression profiling, 
comparative genomic hybridization, or real-time PCR. In exemplary real-time PCR 
applications, the nucleic acids are used to determine the amount of one or more target 
nucleic acids (e.g., mRNA splice variants) in a sample. In particular embodiments, 

25 fluorescently labeled RT-PCR products from the amplification of a test nucleic acid 
sample are hybridized to a population of nucleic acids of the invention. Desirably, the 
amount of one or more RT-PCR products is measured to determine the amount of the 
corresponding nucleic acid in the initial sample. 

In yet another aspect, the invention features the use of a nucleic of the invention as a 

30 PCR primer or FISH probe. 

Methods for Selecting a Population of Nucleic Acid 

In one aspect, the invention features a method of selecting a nucleic acid for a 
population of nucleic acids. This method involves (a) determining the melting 
temperature of a nucleic acid of the invention, determining the ability of the nucleic acid 

35 to self-anneal, determining the ability of the nucleic acid to hybridize to one or more 
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exons or introns of a target nucleic acid, and/or determining the ability of the nucleic 
acid to hybridize to a non-target nucleic acid, and (b) selecting the nucleic acid for 
inclusion or exclusion from the population based on the determination in step (a). In 
desirable embodiments, step (a) is performed for at least 2, 3, 4, 5, 6, 10, 20, 50, 100, 
5 200, 500, 1,000, 5,000 or more nucleic acids, and a subset of the nucleic acids are 
selected for inclusion in the population based on the determination in step (a). 
Desirably, the nucleic acids with the highest melting temperatures and/or ability to 
hybridize to one or more exons or introns of a target nucleic acid are selected. 
Desirably, the nucleic acids with the lowest ability to self-anneal and/or hybridize to a 

10 non-target nucleic acid are selected. 

Databases with Hybridization Patterns of Nucleic Acids Samples and/or Standards 

The invention also features a variety of databases. These databases are useful for 
storing the information obtained in any of the methods of the invention. These databases may 
also be used in the diagnosis of disease or an increased risk for a disease or in the selection of a 

15 desirable therapeutic for a particular patient or class of patients. 

Accordingly, in one such aspect, the invention provides an electronic database including 
at least 1, 10, 10 2 , 10 3 , 5 x 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , or 10 9 records of a nucleic acid of interest 
or a population of nucleic acids of interest (e.g., one or more nucleic acids in a standard or in a 
test nucleic acid sample) correlated to records of its hybridization pattern to a plurality of 

20 nucleic acids of the invention under one or more incubation conditions (e.g., one or more 
temperatures, denaturant concentrations, or salt concentrations). 

In another aspect, the invention features computer including the database of the above 
aspect and a user interface (i) capable of displaying a hybridization pattern for a nucleic acid of 
interest or a population of nucleic acids of interest whose record is stored in the computer or (ii) 

25 capable of displaying a nucleic acid of interest (e.g., displaying the polynucleotide sequence or 
another identifying characteristic of the nucleic acid of interest) or a population of nucleic acids 
of interest that produces a hybridization pattern whose record is stored in the computer. 
Methods for Silencing a Target Nucleic Acid in a Cell or Animal 

One method for inhibiting specific gene expression involves the use of antisense or 

30 double stranded oligonucleotides which are complementary to a specific target messenger RNA 
(mRNA) sequence, such as a specific mRNA splice variant. Of special interest are 
oligonucleotides with a modified backbone (such as LNA or phosphorothioate) that are not 
readily degraded by endonucleases in the target cells. 
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In one aspect, the invention features the use of a nucleic acid of the invention for the 
manufacture of a pharmaceutical composition for treatment of a disease curable by an antisense 
or RNAi technology. 

In one aspect, the invention provides a method for inhibiting the expression of a target 
S nucleic acid in a cell. The method involves introducing into the cell a nucleic acid of the 
invention in an amount sufficient to specifically attenuate expression of the target nucleic acid. 
The introduced nucleic acid has a nucleotide sequence that is essentially complementary to a 
region of desirably at least 20 nucleotides of the target nucleic acid. Desirably, the cell is in a 
mammal. 

10 In a related aspect, the invention provides a method for preventing, stabilizing, or 

treating a disease, disorder, or condition associated with a target nucleic acid in a mammal. 
This method involves introducing into the mammal a nucleic acid of the invention in an amount 
sufficient to specifically attenuate expression of the target nucleic acid, wherein the introduced 
nucleic acid has a nucleotide sequence that is essentially complementary to a region of 

15 desirably at least 20 nucleotides of the target nucleic acid. 

In another aspect, the invention provides a method for preventing, stabilizing, or treating 
a pathogenic infection in a mammal by introducing into the mammal a nucleic acid of the 
invention in an amount sufficient to specifically attenuate expression of a target nucleic acid of 
a pathogen. The introduced nucleic acid has a nucleotide sequence that is essentially 

20 complementary to a region of desirably at least 20 nucleotides of the target nucleic acid. 

In desirable embodiments of the therapeutic methods of the above aspects, the mammal 
is a human. In some embodiments, the introduced nucleic acid is single stranded or double 
stranded. 

With respect to the therapeutic methods of the invention, it is not intended that the 
25 administration of nucleic acids to a mammal be limited to a particular mode of administration, 
dosage, or frequency of dosing; the present invention contemplates all modes of administration, 
including oral, intraperitoneal, intramuscular, intravenous, intraarticular, intralesional, 
subcutaneous, or any other route sufficient to provide a dose adequate to prevent or treat a 
disease (e.g., a disease associated with the expression of a target nucleic acid that is silenced 
30 with a nucleic acid of the invention). One or more nucleic acids may be administered to the 
mammal in a single dose or multiple doses. When multiple doses are administered, the doses 
may be separated from one another by, for example, one week, one month, one year, or ten 
years. It is to be understood that, for any particular subject, specific dosage regimes should be 
adjusted over time according to the individual need and the professional judgment of the person 
35 administering or supervising the administration of the compositions. 
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Exemplary mammals that can be treated using the methods of the invention include 
humans, primates such as monkeys, animals of veterinary interest (e.g., cows, sheep, goats, 
buffalos, and horses), and domestic pets (e.g., dogs and cats). Exemplary cells in which one or 
more target genes can be silenced using the methods of the invention include invertebrate, 
5 plant, bacteria, yeast, and vertebrate (e.g., mammalian or human) cells. 

Optimum dosages for gene silencing applications may vary depending on the relative 
potency of individual oligonucleotides, and can generally be estimated based on EC50 values 
found to be effective in in vitro and in vivo animal models. In general, dosage is from 0.001 ug 
to 100 g per kg of body weight (e.g., 0.001 ug/kg to 1 g/kg), and may be given once or more 

10 daily, weekly, monthly or yearly, or even once every 2 to 20 years (U.S.P.N. 6,440,739). 
Persons of ordinary skill in the art can easily estimate repetition rates for dosing based on 
measured residence times and concentrations of the drug in bodily fluids or tissues. Following 
successful treatment, it may be desirable to have the patient undergo maintenance therapy to 
prevent the recurrence of the disease state, wherein the oligonucleotide is administered in 

15 maintenance doses, ranging from 0.001 ug to 100 g per kg of body weight (e.g., 0.001 ug/kg to 
1 g/kg). once or more daily, to once every 20 years. If desired, conventional treatments may be 
used in combination with the nucleic acids of the present invention. 

Suitable carriers include, but are not limited to, saline, buffered saline, dextrose, water, 
glycerol, ethanol, and combinations thereof. The composition can be adapted for the mode of 

20 administration and can be in the form of, for example, a pill, tablet, capsule, spray, powder, or 
liquid. In some embodiments, the pharmaceutical composition contains one or more 
pharmaceutically acceptable additives suitable for the selected route and mode of 
administration. These compositions may be administered by, without limitation, any parenteral 
route including intravenous, intra- arterial, intramuscular, subcutaneous, intradermal, 

25 intraperitoneal, intrathecal, as well as topically, orally, and by mucosal routes of delivery such 
as intranasal, inhalation, rectal, vaginal, buccal, and sublingual. In some embodiments, the 
pharmaceutical compositions of the invention are prepared for administration to vertebrate (e.g., 
mammalian) subjects in the form of liquids, including sterile, non-pyrogenic liquids for 
injection, emulsions, powders, aerosols, tablets, capsules, enteric coated tablets, or 

30 suppositories. 

Exemplary Oligomers of the Invention and Methods for Synthesizing Them 

In desirable embodiments, the invention features a method of synthesizing a nucleic 
acid. This method involves synthesizing a 2-thio-uridine nucleoside or nucleotide of formula 
IV using a compound of formula VIII, IX, X, XI, or XII as shown below. The nucleoside, 
35 nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. 


LNA21/SKA/MSL 


5/14/2003 


23 



In a particular embodiment, nucleobase thiolation is performed on the 02 position of 
compound XI to form compound IV. In another embodiment, sulphurization on both 
5 02 and 04 in compound VIII generates a 2,4-dithio-uridine nucleoside or nucleotide of 
formula X which is converted into compound IV. In yet another embodiment, a cyclic 
ether of formula XI is transferred into compound IV or a 2-O-alkyi-uridine nucleoside 
or nucleotide of formula XII through reaction with the 5* position. In other 
embodiments, a 2-O-alkyl-uridine nucleoside or nucleotide of formula XII is generated 

10 by direct alkylation of a uridine nucleoside or nucleotide of formula VIII. 

In desirable embodiments R 4 and R 2 are each independently alkyl (e.g., methyl or 
ethyl), acyl (e.g., acetyl or benzoyl), or any appropriate protecting group such as silyl, 4,4'- 
dimethoxytrityl, monomethoxytrityl, or trityl(triphenyJmethyl). R 5 is any appropriate 
protecting group such as silyl, 4,4 , -dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), 

15 acetyl, benzoyl, or benzyl. In desirable embodiments, R 5 is hydrogen, alkyl (e.g., methyl or 
ethyl), 1-propynyl, thiazol-2-yl, pyridine-2-yl, thien-2-yl, imidazol-2-yl, (4/5-methyl)-thiazol-2- 
yl, 3-(iodoacetamido)propyl, 4-[A^bis(3-aminopropyl)amino]butyl), or halo (e.g., chloro, 
bromo, iodo, fluoro). 

The group -OR 3 ' in the formulas IV, VIII, IX, X, XI, and XII is any of the groups listed 
20 for R 3 or R 3 in formula la or formula lb or listed for R 3 or R 3 * in formula Ha, Scheme A, or 
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Scheme B, or the group -OR 3 or R 3 in the formulas IV, VIII, IX, X, XI, and XII is selected 
from the group consisting of H, -OH, P(0(CH 2 ) 2 CN)N(iPr) 2 . P(0(CH 2 hCN)N(iPr) 2 . phosphate, 
phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, 
phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, 
5 iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g t methyl or 
ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, 
hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 
heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 

10 heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., silyl, 4,4-dimethoxytrityl, monomethoxytrityl, or 
trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

The group -OR 5 in the formulas IV, and VIII, IX, X, and XII is any of the groups listed 

15 for R 5 or R 5 in formula la or formula lb or listed for R 5 or R 5 * in formula Ila, Scheme A, or 
Scheme B, or the group -OR 5 or R 5 in the formulas IV, and VIII, IX, X, and XII is selected 
from the group consisting of H, -OH, P(0(CH 2 ) 2 CN)N(iPr) 2 .P(0(CH 2 ) 2 CN)N(iPr) 2 , phosphate, 
phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, 
phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, 

20 iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or 
ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, 
hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 
heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 

25 heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or 
trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

In yet another aspect, the invention features a method of synthesizing a nucleic acid. 

30 This method involves synthesizing a 2-thiopyrimidine nucleoside or nucleotide of formula IV 
using a compound of formula III or compounds of the formula I, II, and III as shown below. 
The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of 
the invention. 
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IV 

In some embodiments, lewis acid-catalyzed condensation of a substituted sugar of formula I 
and a substituted 2-thio-uracil of formula II results in a substituted 2-thio-uridine nucleoside or 
nucleotide of the formula III. In some embodiments, a compound of formula HI is converted 
5 into a LNA 2-thiouridine nucleoside or nucleotide of formula IV. 

In desirable embodiments R 4 and R 5 are, e.g., methanesulfonyloxy, p- 
toluenesulfonyloxy, or any appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, 
monomethoxytrityl, trityl(triphenylmethyl), acetyl, benzoyl, or benzyl, R 1 is, e.g., acetyl, 
benzoyl, alkoxy (e.g., methoxy). R 2 is, e.g.,acetyl or benzoyl, and R 3 is any appropriate 
10 protecting group such as silyl, 4,4 , -dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), 
acetyl, or benzoyl. In desirable embodiments, R 5 is hydrogen, alkyl (e.g. methyl or ethyl), 1- 
propynyl, thiazol-2-yl, pyridine-2-yl, thien-2-yl, imidazol-2-yl, (4/5-methyl)-thiazol-2-yl, 3- 
(iodoacetamido)propyl, 4-[A^^V-bis(3-aminopropyl)amino]butyl), or halo (e.g., chloro, bromo, 
iodo, fluoro). 

15 The group -OR 3 in the formulas I, III, and IV is any of the groups listed for R 3 or R 3 in 

formula la or formula lb or listed for R 3 orR 3 * in formula Ha, Scheme A, or Scheme B, or the 
group -OR 3 or R 3 in the formulas I, III, and IV is selected from the group consisting of H, -OH, 
P(0(CH2)2CN)N(iPr)2, phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, 
phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo 

20 (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl 
(e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, 
hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkyisulfonyl, arylsulfonyl. 
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heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 
heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., siiyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or 
trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
5 such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

The group R 5 in the formulas I, III, and IV is any of the groups listed for R 5 orR 5 in 
formula la or formula lb or listed for R 5 orR 5 " in formula Ila, Scheme A, or Scheme B, or R 5 in 
the formulas I, III, and IV is selected from the group consisting of H, -OH, 
P(0(CH2)2CN)N(iPr>2, phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, 

10 phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo 
{e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g. y phenyl or benzyl), alkyl 
(e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, 
hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 

15 heteroarylsulfonyl, alkylsulfinyl, aiylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 

heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., siiyl, 4,4-dimethoxytrityl, monomethoxytrityl, or 
trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

20 In still another aspect, the invention features a method of synthesizing a nucleic acid. 

This method involves synthesizing a 2-thiopyrimidine nucleoside or nucleotide of formula IV 
using a compound of formula VII, compounds of the formula V, VI, and VII, or compounds of 
the formula I, V, VI, and VII as shown below. The nucleoside, nucleoside phosphoramidite, or 
nucleotide is incorporated into a nucleic acid of the invention. 
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In some embodiments, a 2-thio-uridine nucleoside or nucleotide of the formula IV is 
synthesized through ring-synthesis of the nucleobase by reaction of an amino sugar of the 
5 formula V and a substituted isothiocyanate of the formula VI. 

In desirable embodiments, R 4 and R 5 are each idenpendently, e.g., 
methanesulfonyloxy, p-toluenesulfonyloxy, or any appropriate protecting group such as silyl, 
4,4 , -dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), acetyl, benzoyl, or benzyl. R 1 
is, e.g., acetyl or benzoyl or alkoxy (e.g., methoxy), and R 2 is, e.g., acetyl or benzoyl, R 3 is any 

10 appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, 
trityl(triphenylmethyl), acetyl, or benzoyl. R 5 are R 6 each idenpendently, e.g., hydrogen or 
alkyl (e.g. methyl or ethyl). R 6 can also be, e.g., an appropriate protecting group such as silyl, 
4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl). In desirable embodiments, 
R 5 is hydrogen or methyl, and R 6 is methyl or ethyl. 

15 The group -OR 3 in the formulas I, V, VII, and IV is any of the groups listed for R 3 or R 3 

in formula la or formula lb or listed for R 3 orR 3 * in formula Ila, Scheme A, or Scheme B, or the 
group -OR 3 or R 3 in the formulas I, V, VII, and IV is selected from the group consisting of H, - 
OH, P(0(CH2)2CN)N(iPr) 2 , phosphate, phosphorothioate, phosphorodithioate, 
phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl 

20 phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., 
phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or 
benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, 
carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, 
alkylsulfonyl, arylsulfonyl, heteroaryisulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, 
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alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio.amidino, amino, carbamoyl, 
sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4-dimethoxytrityl, 
monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, 
ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or 
5 fluorescent labels), and biotin. 

R 5 in the formulas I, V, VII, and IV is any of the groups listed for R 5 or R 5 in formula la 
or formula lb or listed for R 5 or R 5 * in formula Ila, Scheme A, or Scheme B, or R 5 in the 
formulas I, V, VII, and IV is selected from the group consisting of H, -OH, 
P(0(CH2)2CN)N(iPr>2. phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, 

10 phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo 
(e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl 
(e.g. methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, 
hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 

1 5 heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 
heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., silyl, 4,4-dimethoxytrityl, monomethoxytrityl, or 
trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

20 In another aspect, the invention features a method of synthesizing a nucleic acid. This 

method involves synthesizing a 2-thiopyrimidine nucleoside as shown below. In desirable 
embodiments, the method further comprises reacting one or both compounds of the formula 4 
with a phosphodiamidite (e.g., 2-cyanoethyl tetraisopropylphosphorodiamidite) to produce the 
corresponding nucleoside phosphoramidite. The nucleoside, nucleoside phosphoramidite, or 

25 nucleotide is incorporated into a nucleic acid of the invention. 
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In some embodiments, a glycosyl-donor is coupled to a nucleobase as shown in pathway A. In 
other embodiments, ring synthesis of the nucleobase is performed as show in pathway B. In 
5 still other embodiments, LNA-T diol is modified as shown in pathway C. 

In desirable embodiments, Ris hydrogen, methyl, 1-propynyl, thiazol-2-yl, pyridine-2- 
yi, thien-2-yl, imidazol-2-yl, (4/5-methyl)-thiazol-2-yl, 3-(iodoacetamido)propyl, 4-[AT,N-bis(3- 
aminopropyI)amino]butyl, or halo (e.g., chloro, bromo, iodo, fluoro). Desirably, R|, R2, and R3 
are each any appropriate protecting group such as acetyl, benzyl, silyl, 4,4-dimethoxytrityl, 

10 monomethoxytrityl, or trityl(triphenylmethyl). 

In another aspect, the invention features a method of synthesizing a nucleic acid. This 
method involves synthesizing a 2-thiopyrimidine nucleoside or nucleotide of formula 4 using a 
compound of formula 3, compounds of the formula 2 and 3, or compounds of the formula 1, 2, 
3, and 4 as shown below. The nucleoside, nucleoside phosphoramidite, or nucleotide is 

15 incorporated into a nucleic acid of the invention. This method can also be performed using any 
other appropriate protecting groups instead of Bn (benzyl), Ac (acetyl), or Ms 
(methansulfonyl). 
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In desirable embodiments, the method further comprises reacting one or both compounds of the 
formula 4 with a phosphodiamidite (e.g., 2-cyanoethyl tetraisopropylphosphorodiamidite) to 
produce the corresponding nucleoside phosphoramidite. 

5 In another aspect, the invention features a method of synthesizing a nucleic acid. This 

method involves synthesizing a nucleoside or nucleotide of formula 10 or 11 using a compound 
of any one of the formula 6-9, compounds of the formula 5 and any one of the formulas 6-9, or 
compounds of the formula 4, 5, and any one of the formulas 6-9 as shown below. The 
nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the 

10 invention. This method can also be performed using any other appropriate protecting groups 
instead of DMT, Bn, Ac, or Ms. 
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In some embodiments, a compound of formula 4 is used as a glycosyl donor in a coupling 
reaction with silylated hypoxantine to form a compound of the formula 5. In certain 
embodiments, a compound of the formula 5 is used in a ring closing reaction to forma 
compound of the formula 6. Desirably, deprotection of the 5'-hydroxy group of compound 6 is 
5 performed by displacing the 5'-0~mesyl group with sodium benzoate to produce a compound of 
the formula 7 that is converted into a compound of the formula 8 after saponification of the 5 1 - 
benzoate. In some embodiments, compound 8 is converted to a DMT-protected compound 9 
prior to debenzylation of the S'-O-hydroxy group. In desirable embodiments, a 
phosphoramidite of the formula 11 is generated by phosphitylation of a nucleoside of the 
10 formula 10. 

In desirable embodiments, the R| is H or P(0(CH 2 )2CN)N(iPr) 2 . In other embodiments, 
the group R| or -ORi is any of the groups listed for R 3 or R 3 in formula la or formula lb or listed 
for R 3 or R 3 * in formula Ha, Scheme A, or Scheme B, or the group 

-OR i or R| is selected from the group consisting of-OH, P(0(CH 2 )2CN)N(iPr)2, phosphate, 

15 phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, 
phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, 
iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g % methyl or 
ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, 
hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 

20 aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 
heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 
heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., silyl, 4,4 , -dimethoxytrityl, monomethoxytrityl, or 
trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 

25 such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

In another aspect, the invention features a method of synthesizing a nucleic acid. This 
method involves synthesizing a nucleoside or nucleotide of formula 20 or 21 as shown below, 
in which compound 4 is the same sugar shown in the above aspect. The nucleoside, nucleoside 
phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. This 

30 method can also be performed using any other appropriate protecting groups instead of DMT, 
Bn, Bz (benzoyl), Ac, or Ms. Additionally, the method can be performed with any other 
halogen (e.g., fluoro or bromo) instead of chloro. 


35 
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In desirable embodiments to promote the ring closing reaction, a solution of compound 14 in 
aqueous 1,4-dioxane is treated with sodium hydroxide to give a bicyclic compound 15. In some 
embodiments, sodium benzoate is used for displacement of 5-mesylate of compound 15 to give 

25 compound 16. In some embodiments, compound 17 is formed by reaction of compound 16 
with sodium azide. In some embodiments, compound 18 is produced by saponification of the 
5-benzoate of compound 17. In certain embodiments, hydrogenation of compound 18 produces 
compound 19. In certain embodiments, the peracelation method is used to benzolylate the 2- 
and 6-amino groups of compound 19, yielding 20, which is desirably converted into the 

30 phosphoramidite compound 21. 

In a related aspect, the invention features a derivative of a compound of the formula 20 or 21 as 
described in the above aspect in which 3' -OH or -OP(0(CH 2 )2CN)N(iPr) 2 group is replaced by 
any other group is selected from the group consisting of phosphorothioate, phosphorodithioate, 
phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl 

35 phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., 
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phenyl or benzyl), alkyl (e.g f methyl or ethyl), alkoxy {e.g., methoxy), acyl {e.g. acetyl or 
benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, 
carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, 
alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, 

5 alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, 
sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4*-dimethoxytrityl, 
monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, 
ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or 
fluorescent labels), and biotin. 

10 In yet another aspect, the invention features a method of synthesizing a nucleic acid. 

This method involves synthesizing a nucleoside or nucleotide of formula 20 or 21 as shown 
below. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a 
nucleic acid of the invention. This method can also be performed using any other appropriate 
protecting groups instead of DMT. 



I — 20: R = H 

21: R= P(0(CH 2 ) 2 CN)N(iPr) 2 
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In some embodiments, compound 17 is formed by reaction of compound 7 withl,3-dichloro- 
1,1,3,3-tetraisopropyldisiloxane. Desirably, compound 18 is formed by reaction of compound 
17 with phenoxyacetic anhydride. In some embodiments, compound 19 is generated by 
reaction of compound 18 with acid. Desirably, compound 20 is produced by reacting 
5 compound 19 with DMT-C1. In desirably embodiments, compound 20 is reacted with 2- 
cyanoethyl tetraisopropylphosphorodiamidite to give the phosphoramidite 21. 

In desirable embodiments, the R is H or P(0(CH2)2CN)N(iPr) 2 . In other embodiments, 
the R or -OR is any of the groups listed for R 3 or R 3 in formula la or formula lb or listed for R 3 
orR 3 * in formula Ha, Scheme A, or Scheme B, or the group 

10 -OR or R is selected from the group consisting of-OH, P(0(CH 2 )2CN)N(iPr) 2 , phosphate, 
phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, 
phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, 
iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or 
ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, 

15 hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 
heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 
heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., silyl, 4,4-dimethoxytrityl, monomethoxytrityl, or 

20 trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

In yet another aspect, the invention features a method of synthesizing a nucleic acid. 
This method involves synthesizing a nucleoside or nucleotide of formula 24 or 25 as shown 
below. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a 

25 nucleic acid of the invention. This method can also be performed using any other appropriate 
protecting groups instead of Bz, Bn, and DMT. Additionally, the method can be performed 
with any other halogen (e.g., fluoro or bromo) instead of chloro. 
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In some embodiments, the compound 16 is formed from compounds 4, 14, and 15 as 
illustrated in an aspect above. Desirably, the 5'-0-benzoyl group of compound 16 is hydrolyzed 
by aqueous sodium hydroxyde to give compound 22. Compound 23 is desirably produced by 
incubation of compound 22 in the presence of paladium hydroxide and ammonium formate. 

20 Desirably, the 2-amine of compound 23 is selectively protected with an amidine group after 
treatment with tyW-dimethylformamide dimethyl acetal to yield compound 24. In some 
embodiments, the diol 24 is 5*-0-DMT protected and 3*-0-phosphitylated produce the 
phosphoramidite JLNA-2AP compound 25. 

In some embodiments, compound 25 has one of the following groups instead of the 

25 P(0(CH 2 )2CN)N(iPr) 2 group: any of the groups listed for R 3 orR 3 ' in formula la or formula lb 
or listed for R 3 orR 3 * in formula Ila, Scheme A, or Scheme B, or a group 

selected from the group consisting of-OH, phosphate, phosphorothioate, phosphorodithioate, 
phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl 
phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., 

30 phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or 
benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, 
carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, 
alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, 
alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, 

35 sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityU 
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monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, 
ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or 
fluorescent labels), and biotin. 

In another aspect, the invention features a nucleic acid of the invention that includes a 
5 compound of the formula 6pCor the product of a compound of the formula 6pC treated with 
ammonia as described herein. In a related aspect, the invention features a method of 
synthesizing a nucleic acid that involves performing one or more of the steps described herein 
for the synthesis of a compound of the formula 6pCor the product of a compound of the 
formula 6pC treated with ammonia. 

10 In yet another aspect, the invention features a method of synthesizing a nucleic acid. This 
method involves one or more of any of the nucleosides or nucleotides of the invention with (i) 
any other nucleoside or nucleotide of the invention, (ii) any other nucleoside or nucleotide of 
formula la, formula lb, formula Ila, Scheme A, or Scheme B, and/or (iii) any naturally- 
occurring nucleoside or nucleotide. Desirably, the method involves reacting one or more 

15 nucleoside phosphoramidites of any of the above aspects with a nucleotide or nucleic acid. 
Methods for Synthesis of Nucleic Acids on a Solid Support 

In another aspect, the invention provides a method for the synthesis of a 
population of nucleic acids (e.g., a population of nucleic acids of the invention) on a 
solid support. This method involves the reaction of a plurality of nucleoside 

20 phosphoramidites with an activated solid support (e.g., a solid support with an activated 
linker) and the subsequent reaction of a plurality of nucleoside phosphoramidites with 
activated nucleotides or nucleic acids bound to the solid support. 

In some embodiments of any of the above aspects, the solid support or the 
growing nucleic acid bound to the solid support is activated by illumination, a 

25 photogenerated acid, or electric current. In desirable embodiments, one or more spots or 
regions (e.g., a region with an area of less than 1 cm 2 , 0.1 cm 2 , 0.01 cm 2 , 1 mm 2 , or 0.1 
mm 2 that desirably contains one particular nucleic acid monomer or oligomer) on the 
solid support are irradiated to produce a photogenerated acid that removes the 5'-OH 
protecting group of one or more nucleic acid monomers or oligomers to which a 

30 nucleotide is subsequently added. In other embodiments, an electric current is applied 
to one or more spots or regions (e.g., a region with an area of less than 1 cm 2 , 0.1 cm 2 , 
0.0 1 cm 2 , 1 mm 2 , or 0.1 mm 2 that desirably contains one particular nucleic acid 
monomer or oligomer) on the solid support to remove an electrochemically sensitive 
protecting group of one or more nucleic acid monomers or oligomers to which a 

35 nucleotide is subsequently added. In still other embodiments, one or more spots or 
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regions (e.g., a region with an area of less than 1 cm 2 , 0.1 cm 2 , 0.01 cm 2 , 1 mm 2 , or 0.1 
mm 2 that desirably contains one particular nucleic acid monomer or oligomer) on the 
solid support are irradiated to remove a photosensitive protecting group of one or more 
nucleic acid monomers or oligomers to which a nucleotide is subsequently added. In 
5 various embodiments, the solid support (e.g., chip, coverslip, microscope glass slide, 
quartz, or silicon) is less than 1, 0.5, 0.1. or 0.05 mm thick. 
Methods for the Synthesis of Nucleic Acids 

In another aspect, the invention features a method of reacting a population of 
nucleic acids of the invention with one or more nucleic acids. This method involves 

10 incubating an immobilized population of nucleic acids of the invention with a solution 
that includes one or more probes (e.g., at least 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 60, 80, 
100, or 150 different nucleic acids) and one or more target nucleic acids (e.g., at least 2, 
3, 4, 5, 10, 15, 20, 30, 40, 50, 60, 80, 100, or 150 different target nucleic acids). The 
incubation is performed in the presence of a ligase under conditions that allow the ligase 

15 to covalently react one or more immobilized nucleic acids with one or more nucleic acid 
probes in solution that hybridize to the same target nucleic acid. Desirably, at least 2, 5, 
10, 15, 20, 30, 40, 50, 80, or 100 pairs of immobilized nucleic acids and nucleic acid 
probes are ligated. In various embodiments, the incubation occurs between 15 and 
45°C, such as between 20 and 40°C or between 25 and 35°C 

20 Desirable Embodiments of Any of the Aspects of the Invention 

In other embodiments of any of various aspects of the invention, a nucleic acid probe or 
primer specifically hybridizes to a target nucleic acid but does not substantially hybridize to 
non-target molecules, which include other nucleic acids in a cell or biological sample having a 
sequence that is less than 99, 95, 90, 80, or 70% identical or complementary to that of the target 

25 nucleic acid. Desirably, the amount of the these non-target molecules hybridized to, or 
associated with, the nucleic acid probe or primer, as measured using standard assays, is 2-fold, 
desirably 5- fold, more desirably 10-fold, and most desirably 50-fold lower than the amount of 
the target nucleic acid hybridized to, or associated with, the nucleic acid probe or primer. In 
other embodiments, the amount of a target nucleic acid hybridized to, or associated with, the 

30 nucleic acid probe or primer, as measured using standard assays, is 2-fold, desirably 5-fold, 
more desirably 10-fold, and most desirably 50-fold greater than the amount of a control nucleic 
acid hybridized to, or associated with, the nucleic acid probe or primer. In certain 
embodiments, the nucleic acid probe or primer RNA is substantially complementary (e.g., at 
least 80, 90, 95, 98, or 100% complementary) to a target nucleic acid or a group of target 

35 nucleic acids from a cell. In other embodiments, the probe or primer is homologous to multiple 
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RNA or DNA molecules, such as RNA or DNA molecules from the same gene family. In other 
embodiments, the probe or primer is homologous to a large number of RNA or DNA 
molecules. In desirable embodiments, the probe or primer binds to nucleic acids which have 
polynucleotide sequences that differ in sequence at a position that corresponds to the position of 
5 a universal base in the probe or primer. Examples of control nucleic acids include nucleic acids 
with a random sequence or nucleic acids known to have little, if any, affinity for the nucleic 
acid probe or primer. In some embodiments, the target nucleic acid is an RNA, DNA, or cDNA 
molecule. 

Desirably, the association constant (K*) of the nucleic acid toward a complementary 
10 target molecule is higher than the association constant of the complementary strands of the 
double stranded target molecule. In some desirable embodiments, the melting temperature of a 
duplex between the nucleic acid and a complementary target molecule is higher than the 
melting temperature of the complementary strands of the double stranded target molecule. 

In some embodiments, the LNA-pyrene is in a position corresponding to the position of 
15 a non-base (e.g., a unit without a base) in another nucleic acid, such as a target nucleic acid. 
Incorporation of pyrene in a DNA strand that is hybridized against the four natural bases 
decreases the T m by -4.5°C to -6.8°C; however, incorporation of pyrene in a DNA strand in a 
position opposite a non-base only decreases the T m by -2.3°C to -4.6°C, most likely due to the 
better accommodation of the pyrene in the B-type duplex (Matray and Kool, J. Am. Chem. Soc. 
20 120, 6191, 1998). Thus, incorporation on LNA-pyrene into a nucleic acid in a position opposite 
a non-base (e.g., a unit without a base or a unit with a small group such as a noncyclic group 
instead of a base) in a target nucleic acid may also minimize any potential decrease in T m due to 
the pyrene substitution. 

In various embodiments, the number of molecules in the plurality of nucleic acids of the 
25 invention is at least 2, 4, 5, 6, 7, 8, or 10-fold greater than the number of molecules in the test 
nucleic acid sample. In some embodiments, a LNA is a triplex-forming oligonucleotide. 

In desirable embodiments of any of the aspects of the invention, the target nucleic acids 
(e.g., cDNA molecules reverse transcribed from a patient sample) are fragmented using an 
enzyme such as a uracil-DNA glycosylase (e.g., E. coli uracil-DNA glycosylase) or using 
30 chemical hydrolysis such as alkaline hydrolysis. In various embodiments, the average size of 
the fragmented nucleic acids is between 300 and 50 nucleic acids, such as approximately 300, 
200, 100, or 50 nucleotides. 
Advantages 

The present invention has a variety of advantages related to nucleic acid analysis 
35 methods. The ability to equalize melting temperatures of a series of nucleic acids is 
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generally applicable and desirable in all situations where more than one sequence is 
used simultaneously (e.g. DNA arrays with more than one capture probe, PCR and 
especially multiplex PCR, homogeneous assays such asTaqman and Molecular beacon). 
Sample preparation of specific sequences (e.g., DNA or RNA extraction using capture 
5 probes on Filters or magnetic beads) is another area where melting temperature 
equalization of specific probe sequences is useful. 

For example, the invention provides high affinity nucleotides (e.g., LNA and other high 
affinity nucleotides with a modified base and/or backbone) that can be used, e.g., arrays of the 
invention. In particular, the nucleic acids of the invention containing LNA units exhibited a 

10 suprising ability to discriminate between different mRNA splice variants compared to naturally- 
occurring nucleic acids. If desired, universal bases can be added as part of flanking regions in 
capture probes (e.g., probes of an array) to stabilize hybridization with high affinity nucleotides 
in the capture probes. Replacement of one or more DNA-t nucleotides with LNA-T and/or 
replacement of one or more DNA-a nucleotides with LNA-A reduces the variability of melting 

15 temperatures for capture probes of similar length but different GC and AT content by desirably 
at least 10, 20, 30, 40 or 50%. Additionally, replacement of one or more DNA-t nucleotides 
with LNA-T and/or replacement of one or more DNA-c with LNA-C increases the stability of a 
large number of capture probes, while desirably avoiding self-complementary sequences with 
LNArLNA base-pairs within a capture probe that would otherwise reduce or eliminate the 

20 binding of target molecules to the probe. Although a general T and C substitution may not 
reduce the variability of melting temperatures of the probes, this substitution increases the 
melting temperature and binding efficiency of many capture probes that contain these two 
nucleotides. 

The invention also provides a general substitution algorithm for enhancement of the 
25 hybridization signal of a test nucleic acid sample by inclusion of high affinity monomers (e.g., 
LNA and other high affinity nucleotides with a modified base and/or backbone) in the array. 
This method increases the stability and binding affinity of capture probes while avoiding 
substitutions in positions that may form self-complementary base-pairs which may otherwise 
inhibit binding to a target molecule. The substitution algorithm is broadly useful for specialized 
30 arrays, as well as for PCR primers and FISH probes. 

Other features and advantages of the invention will be apparent from the following 
detailed description. 
Definitions 

When used herein, the term "LNA" (Locked Nucleoside Analogues) refers to nucleoside 
35 analogues (e.g., bicyclic nucleoside analogues, e.g., as disclosed in WO 9914226) either 
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incorporated in an oligonucleotide or as a discrete chemical species (e.g., LNA nucleoside and 
LNA nucleotide). The term "monomelic LNA M may, e.g., refer to the monomers LNA A, LNA 
T, LNA C, or any other LNA monomers. 

By "LNA unit" is meant an individual LNA monomer (e.g., an LNA nucleoside or LNA 
5 nucleotide) or an oligomer (e.g., an oligonucleotide or nucleic acid) that includes at least one 
LNA monomer. LNA units as disclosed in WO 99/14226 are in general particularly desirable 
modified nucleic acids for incorporation into an oligonucleotide of the invention. Additionally, 
the nucleic acids may be modified at either the 3' and/or 5' end by any type of modification 
known in the art. For example, either or both ends may be capped with a protecting group, 

10 attached to a flexible linking group, attached to a reactive group to aid in attachment to the 
substrate surface, etc. Desirable LNA units and their method of synthesis also are disclosed in 
WO 0056746, WO 0056748, WO 0066604, Morita et aL, Bioorg. Med. Chem. Lett. 12(1):73- 
76, 2002; Hakansson et aL, Bioorg. Med. Chem. Lett. 1 1(7):935-938, 2001; Koshkin et aL, J. 
Org. Chem. 66(25):8504-8512, 2001; Kvaerno et aL f J. Org. Chem. 66(16):5498-5503, 2001; 

15 Hakansson et al y J. Org. Chem. 65(17):5 161-5 166, 2000; Kvaerno et aL, J. Org. Chem. 
65(17):5167-5176, 2000; Pfundheller et al y Nucleosides Nucleotides 18(9):20 17-2030, 1999; 
and Kumar et al, Bioorg. Med. Chem. Lett. 8(16):22 19-2222, 1998. 

By "LNA modified oligonucleotide" is meant a oligonucleotide comprising at least one 
LNA monomelic unit of the general scheme A, described infra, having the below described 

20 illustrative examples of modifications: 


wherein X is selected from -O-, -S-, -N(R N )- t -C(R 6 R 6 *)-. -0-C(R 7 R 7 *)-, -C(R 6 R 6 >0-, -S- 
25 C(R 7 R 7 *)-, -C(R 6 R 6 *)-S-, -N(R N *)-C(R 7 R 7 *)-, -C(R 6 R 6 ")-N(R N ")- > and -C(R 6 R 6 >C(R 7 R 7 *). 

B is selected from a modified base as discussed above e.g. an optionally substituted 
carbocyclic aryl such as optionally substituted pyrene or optionally substituted 
pyrenylmethylglycerol, or an optionally substituted heteroalicylic or optionally substituted 
heteroaromatic such as optionally substituted pyridyloxazole, optionally substituted pyrrole, 
30 optionally substituted dtazole or optionally substituted triazole moieties; hydrogen, hydroxy, 
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optionally substituted Ci^-alkoxy, optionally substituted Ci^-alkyl, optionally substituted C1-4- 
acyloxy, nucleobases, DNA intercalators, photochemically active groups, thermochemically 
active groups, chelating groups, reporter groups, and ligands. 

P designates the radical position for an internucleoside linkage to a succeeding 
5 monomer, or a 5'~terminal group, such internucleoside linkage or S'-terminal group optionally 
including the substituent R 5 . One of the substituents R 2 t R 2 *, R 3 , and R 3 * is a group P* which 
designates an internucleoside linkage to a preceding monomer, or a 273*-terminal group. The 
substituents of R 1 *, R 4 \ R 5 , R 5 *, R 6 , R 6 \ R 7 , R 7 *, R N , and the ones of R 2 , R 2 ', R 3 , and R 3 * not 
designating P* each designates a biradical comprising about 1-8 groups/atoms selected from - 

10 C(R a R b )-, -C(R a )=C(R a )-, -C(R°)=N-, -C(R a )-0-, -O-, -Si(R a ) 2 -, -C(R a )-S, -S-, -SO2-, -C(R a )- 
N(R b )-, -N(R a )-, and >C=Q, wherein Q is selected from -O-, -S-, and -N(R a )-, and R° and R b 
each is independently selected from hydrogen, optionally substituted C M2 -alkyl, optionally 
substituted C2-i2-alkenyl, optionally substituted C2.i2-alkynyl, hydroxy, d-12-alkoxy, C2-12- 
alkenyloxy, carboxy, C|.i2-alkoxycarbonyl, Ci-12-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, 

15 aryloxy, arylcarbonyl, heteroaryl, hetero-aryloxy-carbonyU heteroaryloxy, heteroarylcarbonyl, 
amino, mono- and di(C|.6-alkyl)amino, carbamoyl, mono- and di(Ci-6-a!kyl)-amino-carbonyl, 
amino-C|.6-alkyl-aminocarbonyl, mono- and di(Ci^-alkyl)amino-Ci-6-alkyl-aminocarbonyl, 
C 1 _6-alkyl-carbony lamino , carbamido, C 1 ^-alkanoyloxy, sulphono, C 1 ^-alky lsu Iphony loxy, 
nitro, azido, sulphanyl, O-6-alkylthio, halogen, DNA intercalators, photochemically active 

20 groups, thermochemically active groups, chelating groups, reporter groups, and ligands, where 
aryl and heteroaryl may be optionally substituted, and where two geminal substituents R a and 
R b together may designate optionally substituted methylene (=CH2), and wherein two non- 
geminal or geminal substituents selected from R\ R b , and any of the substituents R*\ R 2 , R 2 *, 
R 3 , R 3 \ R 4 \ R 5 , R 5 \ R 6 and R 6 \ R 7 , and R 7 " which are present and not involved in P, P* or the 

25 biradical(s) together may form an associated biradical selected from biradicals of the same kind 
as defined before; the pair(s) of non-geminal substituents thereby forming a mono- or bicyclic 
entity together with (i) the atoms to which said non-geminal substituents are bound and (ii) any 
intervening atoms. 

Each of the substituents R 1 *, R 2 , R 2 ', R 3 , R 4 \ R 5 , R 5 \ R 6 and R 6 *, R 7 , and R 7 * which are 
30 present and not involved in P, P* or the biradical(s), is independently selected from hydrogen, 
optionally substituted Ci-12-alkyI, optionally substituted C 2 -i2-alkenyl, optionally substituted C 2 . 
12-alkynyl, hydroxy, Ci-12-alkoxy, C 2 -] 2 -alkenyloxy, carboxy, Ci-12-alkoxycarbonyl, C|_i 2 - 
alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryloxy, arylcarbonyl, heteroaryl, heteroaryloxy- 
carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di(Ci-6-alkyl)amino, carbamoyl, 
35 mono- and di(Ci^-alkyl)-amino-carbonyI, amino-C|_6-a!kyl-aminocarbonyl, mono- and di(Ci-6- 
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alkyl)amino-Ci^-alkyl-aminocarbonyl, CWalkyl-carbonylamino, carbamido, Ci-e-alkanoyloxy, 
sulphono, Ci^-alkylsulphonyloxy, nitro, azido, sulphanyl, C|^-alkylthio, halogen, DNA 
intercalators, photochemically active groups, thermochemically active groups, chelating groups, 
reporter groups, and ligands, where aryl and heteroaryl may be optionally substituted, and 
5 where two geminal substttuents together may designate oxo, thioxo, imino, or optionally 
substituted methylene, or together may form a spiro biradical consisting of a 1-5 carbon atom(s) 
alkylene chain which is optionally interrupted and/or terminated by one or more 
heteroatoms/groups selected from -O-, -S-, and -(NR N )- where R N is selected from hydrogen 
and Ci-4-alkyl, and where two adjacent (non-geminal) substituents may designate an additional 

10 bond resulting in a double bond; and R N *, when present and not involved in a biradical, is 
selected from hydrogen and C^-alkyl; and basic salts and acid addition salts thereof. 

Exemplary 5', 3', and/or 2' terminal groups include -H, -OH, halo (e.g., chloro, fluoro, 
iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g y methyl or 
ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, 

15 hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, 
heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, 
heteroarylthio, aralkylthio, heteroaralkylthio.amidino, amino, carbamoyl, sulfamoyl, alkene, 
alkyne, protecting groups (e.g., silyl, 4,4 , -dimethoxytrityl, monomethoxytrityl, or 

20 trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone 
such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. 

It is understood that references herein to a nucleic acid unit, nucleic acid residue, LNA 
unit, or similar term are inclusive of both individual nucleoside units and nucleotide units and 
nucleoside units and nucleotide units within an oligonucleotide. 

25 A "modified base" or other similar term refers to a composition (e.g., a non-naturally 

occuring nucleobase or nucleosidic base) which can pair with a natural base (e.g., adenine, 
guanine, cytosine, uracil, and/or thymine) and/or can pair with a non-naturally occurring 
nucleobase or nucleosidic base. Desirably, the modified base provides a T m differential of 15, 
12, 10, 8, 6, 4, or 2°C or less as described herein. Exemplary modified bases are described in 

30 EP 1 072 679 and WO 97/12896. 

By "nucleobase" is meant the naturally occurring nucleobases adenine (A), guanine (G), 
cytosine (C), thymine (T) and uracil (U) as well as non-naturally occurring nucleobases such as 
xanthine, diaminopurine, 8-oxo-N 6 -methyladenine, 7-deazaxanthine, 7-deazaguanine, N 4 ,N 4 - 
ethanocytosin, N^^-ethano^^-diaminopurine, 5-methylcytosine (mC), S-^-C^-alkynyl- 

35 cytosine, 5-fluorouracil, 5-bromouracil, pseudoisocytosine, 2-hydroxy-5-methyl-4- 
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triazolopyridin, isocytosine, isoguanine, inosine and the "non-naturally occurring" nucleobases 
described in Benner et al., U.S. Pat No. 5,432,272 and Susan M. Freier and Karl-Heinz 
Altmann, Nucleic Acids Research, 1997, vol. 25, pp 4429-4443. The term "nucleobase" thus 
includes not only the known purine and pyrimidine heterocycles, but also heterocyclic 
5 analogues and tautomers thereof. Further naturally and non-naturally occurring nucleobases 
include those disclosed in U.S. Pat. No- 3,687,808 (Merigan, et al.), in Chapter 15 by Sanghvi, 
in Antisense Research and Application, Ed. S. T. Crooke and B. Lebleu, CRC Press, 1993, in 
Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613-722 (see especially 
pages 622 and 623, and in the Concise Encyclopedia of Polymer Science and Engineering, J. I. 

10 Kroschwitz Ed., John Wiley & Sons, 1990, pages 858-859, Cook, Anti-Cancer Drug Design 
1991, 6, 585-607, each of which are hereby incorporated by reference in their entirety). The 
term "nucleosidic base" or "base unit" is further intended to include compounds such as 
heterocyclic compounds that can serve like nucleobases including certain "universal bases" that 
are not nucleosidic b.ases in the most classical sense but serve as nucleosidic bases. Especially 

15 mentioned as universal bases are 3-nitropyrrole, optionally substituted indoles (e.g., 5- 
nitroindole), and optionally substituted hypoxanthine. Other desirable universal bases include, 
pyrrole, diazole or triazole derivatives, including those universal bases known in the art. 

As described herein, various groups of an LNA unit may be optionally substituted. A 
•'substituted" group such as a nucleobase or nucleosidic base and the like may be substituted by 

20 other than hydrogen at one or more available positions, typically 1 to 3 or 4 positions, by one or 
more suitable groups such as those disclosed herein. Suitable groups that may be present on a 
"substituted" group include e.g. halogen such as fluoro, chloro, bromo and iodo; cyano; 
hydroxyl; nitro; azido; alkanoyl such as a C|_6 alkanoyl group such as acyl and the like; 
carboxamido; alkyl groups including those groups having 1 to about 12 carbon atoms, or 1, 2, 3, 

25 4, 5, or 6 carbon atoms; alkenyl and alkynyl groups including groups having one or more 
unsaturated linkages and from 2 to 12 carbon, or 2, 3, 4, 5 or 6 carbon atoms; alkoxy groups 
including those having one or more oxygen linkages and from 1 to about 12 carbon atoms, or 1, 
2, 3, 4, 5 or 6 carbon atoms; aryloxy such as phenoxy; alkylthio groups including those moieties 
having one or more thioether linkages and from 1 to about 12 carbon atoms, or 1 , 2, 3, 4, 5 or 6 

30 carbon atoms; alkylsulfinyl groups including those moieties having one or more sulfinyl 
linkages and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5, or 6 carbon atoms; alkylsulfonyl 
groups including those moieties having one or more sulfonyl linkages and from 1 to about 12 
carbon atoms, or 1,2, 3, 4, 5, or 6 carbon atoms; aminoalkyl groups such as groups having one 
or more N atoms and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5 or 6 carbon atoms; 

35 carbocyclic aryl having 6 or more carbons; aralkyl having 1 to 3 separate or fused rings and 
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from 6 to about 18 carbon ring atoms, with benzyl being a desirable group; aralkoxy having 1 to 
3 separate or fused rings and from 6 to about 18 carbon ring atoms, with O-benzyl being a 
desirable group; or a heteroaromatic or heteroalicyclic group having 1 to 3 separate or fused 
rings with 3 to about 8 members per ring and one or more N, O or S atoms, e.g. coumarinyl, 
5 quinolinyl, pyridyl, pyrazinyl, pyrimidyl, furyl, pyrrolyl, thienyl, thiazolyl, oxazolyl, 
imidazolyl, indolyl, benzofuranyl, benzothiazolyl, tetrahydrofuranyl, tetrahydropyranyl, 
piperidinyl, morpholino and pyrrolidinyl. 

By "oxy-LNA monomer or unit" is meant any nucleoside or nucleotide which contains 
an oxygen atom in a 2-4' linkage. 

10 A "non-oxy-LNA" monomer or unit is broadly defined as any nucleoside or nucleotide 

which does not contain an oxygen atom in a 2*-4- linkage. Examples of non-oxy-LNA 
monomers include 2*-deoxynucleotides (DNA) or nucleotides (RNA) or any analogues of these 
monomers which are not oxy-LNA, such as for example the thio-LNA and amino-LNA 
described herein with respect to formula la and in Singh ex aL J. Org. Chem. 1998, 6, 6078-9, 

15 and the derivatives described in Susan M. Freier and Karl-Heinz Altmann, Nucleic Acids 
Research, 1997, vol 25, pp 4429-4443. 

By "universal base" is meant a naturally-occurring or desirably a non-naturally 
occurring compound or moiety that can pair with a natural base (e.g., adenine, guanine, 
cytosine, uracil, and/or thymine), and that has a T m differential of 15, 12, 10, 8, 6, 4, or 2°C or 

20 less as described herein. 

By "oligonucleotide," "oligomer," or "oligo" is meant a successive chain of monomers 
(e.g., glycosides of heterocyclic bases) connected via internucleoside linkages. The linkage 
between two successive monomers in the oligo consist of 2 to 4, desirably 3, groups/atoms 
selected from -CH 2 -, -O-, -S-, -NR H -, >C=0, >C=NR H , >C=S, -Si(R") 2 -, -SO-, -S(0) 2 -, -P(0) 2 -, 

25 -PO(BH 3 )-, -P(0,S)-, -P(S) 2 -, -PO(R")-, -PO(OCH 3 )-, and -PO(NHR H )-, where R H is selected 
from hydrogen and Ci-4-alkyl, and R" is selected from d-6-alkyl and phenyl. Illustrative 
examples of such linkages are -CH 2 -CH 2 -CH 2 - t -CH 2 -CO-CH 2 - -CH 2 -CHOH-CH 2 -, -0-CH 2 -0-, 
-0-CH 2 -CH 2 -, -0-CH 2 -CH= (including R 5 when used as a linkage to a succeeding monomer), 
-CH 2 -CH 2 -0-, -NR H -CH 2 -CH 2 -, -CH 2 -CH 2 -NR H -, -CH 2 -NR H -CH 2 -, -OCH 2 -CH 2 -NR H -, 

30 -NR H -CO-0-, -NR H -CO-NR H -, -NR H -CS-NR H -, -NR H -C(=NR H )-NR H -, -NR H -CO-CH 2 -NR H -, - 
O CO-0-, -0-CO-CH 2 -0-, «0-CH 2 -CO-0-, -CH 2 -CO-NR H -, -O CO-NR H -, -NR H -CO-CH 2 -, - 
0-CH 2 -CO-NR H -, -0-CH 2 -CH 2 -NR H -, -CH=N-0-, -CH 2 -NR H -0-, -CH 2 -0-N= (including R 5 
when used as a linkage to a succeeding monomer), -CH 2 -0-NR H ~, -CO-NR H -CH 2 -, -CH 2 -NR H - 
O-, -CH 2 -NR H -CO % -O NR H -CH 2 -, -0-NR H -, -0-CH 2 -S-, -S-CH 2 -0-, -CH 2 -CH 2 -S-, -O CH 2 - 

35 CH 2 -S-, -S-CH 2 -CH= (including R 5 when used as a linkage to a succeeding monomer), -S-CH 2 - 
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CH 2 -, -S-CH 2 -CH 2 -0-, -S-CH 2 -CH 2 -S-, -CH 2 -S-CH 2 -, -CH 2 -SO-CH 2 - f -CH 2 -S0 2 -CH 2 -, 
-O-SO O-, 0-S(0) 2 -0-, -0-S(0) 2 -CH 2 -, -0-S(0) 2 -NR H -, -NR H -S(0) 2 -CH 2 -, -0-S(0) 2 -CH 2 -, - 
0-P(0) 2 -0-, -0-P(O,S)-0-, -0-P(S) 2 -0- t -S-P(0) 2 -0-, -S-P(0,S)-0- t -S-P(S) r O-, -O-P(0) 2 -S-, 
-0-P(0,S)-S-, -OP(S) 2 -S- f -S-P(0) 2 -S-, -S-P(0,S)-S-, -S-P(S) 2 -S-, -0-PO(R M ) 0-, -o- 
5 PO(OCH 3 )-0-, -0-PO(OCH 2 CH 3 )-0-, -O-PO(OCH 2 CH 2 S-R)-0-, -0-PO(BH 3 )-0-, -o- 
PO(NHR N )-0-, -0-P(0) 2 -NR H -. -NR H -P(0) 2 -0-, -0-P(0,NR H )-0-, -CH 2 -P(0) 2 -0-, -0-P(0) 2 - 
CH 2 -, and -0-Si(R") 2 -0-; among which -CH 2 -CO-NR H -, -CH 2 -NR H -0-, -S-CH 2 -0-, -0-P(0) 2 - 
O-, -0-P(0,S)-0-, -0-P(S) 2 -0-, -NR H -P(0) 2 -0- f -0-P(0,NR H )-O-, -0-PO(R")-0-, 
-0-PO(CH 3 )-0-, and -0-PO(NHR N )-0-, where R H is selected form hydrogen and C M -alkyl, 

10 and R M is selected from CWalkyl and phenyl, are especially desirable. Further illustrative 
examples are given in Mesmaeker eu ah, Current Opinion in Structural Biology 1995, 5, 343- 
355 and Susan M. Freier and Karl-Heinz Altmann, Nucleic Acids Research, 1997, vol 25, pp 
4429-4443. The left-hand side of the internucleoside linkage is bound to the 5-membered ring 
as substituent P* at the 3'-position, whereas the right-hand side is bound to the 5'-position of a 

15 preceding monomer. 

By "succeeding monomer" is meant the neighboring monomer in the S'-terminal 
direction, and by "preceding monomer" is meant the neighboring monomer in the 3-terminal 
direction. 

By **LNA spiked oligo" is meant an oligonucleotide, such as a DNA oligonucleotide, 
20 wherein at least one unit (and preferably not all units) has been substituted by the corresponding 

LNA nucleoside monomer. 

The term " T m " is used in reference to the "melting temperature." The melting 

temperature is the temperature at which 50% of a population of double-stranded nucleic acid 

molecules becomes dissociated into single strands. The equation for calculating the T m of 
25 nucleic acids is well-known in the art. The T m of a hybrid nucleic acid is often estimated using a 

formula adopted from hybridization assays in 1 M salt, and commonly used for calculating T m 

for PCR primers: T m =[(number of A+T) x 2°C + (number of G+C) x 4 °C]. C. R. Newton et al. 

PCR, 2nd Ed., Springer- Verlag (New York: 1997), p. 24. This formula was found to be 

inaccurate for primers longer that 20 nucleotides. Id. Other more sophisticated computations 
30 exist in the art which take structural as well as sequence characteristics into account for the 

calculation of T m . A calculated T m is merely an estimate; the optimum temperature is 

commonly determined empirically. 

A nucleic acid compound that has a Tm differential of a specified amount (e.g., less than 

15, 12, 10, 8, 6, 4, 2, or 1°C) means the nucleic acid exhibits that specified T m differential when 
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incorporated into a specified 9-mer oligonucleotide with respect to the four complementary 
variants, as defined immediately below. 

Unless otherwise indicated, a T m differential provided by a particular modified base is 
calculated by the following protocol (steps a) through d)): 
5 a) incorporating the modified base of interest into the following oligonucleotide 5 - 

d(GTGAMATGC), wherein M is the modified base; 

b) mixing 1.5 x lO"^ of the oligonucleotide having incorporated therein the 
modified base with each of l.SxlO^M of the four oligonucleotides having the sequence 3- 
d(CACTYTACG), wherein Y is A, C, G, T, respectively, in a buffer of lOmM sodium 

10 phosphate, 100 mM sodium chloride, 0.1 mM EDTA, pH 7.0; 

c) allowing the oligonucleotides to hybridize; and 

d) detecting the T m for each of the four hybridized nucleotides by heating the 
hybridized nucleotides and observing the temperature at which the maximum of the first 
derivative of the melting curve recorded at a wavelength of 260 nm is obtained. 

15 Unless otherwise indicated, a T m differential for a particular modified base is 

determined by subtracting the highest T m value determined in steps a) through d) immediately 
above from the lowest T m value determined by steps a) through d) immediately above. 

By "variance in Tm" is meant the variance in the values of the melting temperatures for a 
population of nucleic acids. The T m for each nucleic acid is determined by experimentally 

20 measuring or computationally predicting the temperature at which 50% of a population double- 
stranded molecules with the sequence of the nucleic acid becomes dissociated into single 
strands. For a nucleic acid with only A, T, C, G, and/or U bases, the T m is the temperature at 
which 50% of a population of 100% complementary double-stranded molecules with the 
sequence of the nucleic acid becomes dissociated into single strands. For determining the T m 

25 variance when a nucleic acid has. one or more nucleobases other than A, T, C, G, or U, the T m 
of this "modified" nucleic acid is approximated by determining the T m for each possible double 
stranded molecule in which one strand is the modified nucleic acid and the other strand has 
either A, T, C, or G in each position corresponding to a nucleobase other than A, T, C, G, or U 
in the modified nucleic acid. For example, if the modified nucleic acid has the sequence XMX 

30 in which X is 0, 1, or more A, T, C, G, or U bases and M is any other nucleobase or nucleosidic 
base, the T m is calculated for each possible double stranded molecule in which one strand is 
XMX and the other strand is X'YX' in which X' is the base complementary to the corresponding 
X base and Y is either A, T, C, or G. The average is then calculated for the T m values for each 
possible double stranded molecule (i.e., four possible duplexes per modified nucleobase or 
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nucleoside base in the modified nucleic acid) and used as the approximate T m value for the 
modified nucleic acid. 

By "capture efficiency" is meant the amount of target nucleic acid(s) bound to a 
particular nucleic acid or a population of nucleic acids. Standard methods can be used to 
5 calculate the capture efficiency by measuring the amount of bound target nucleic acid(s) and/or 
measuring the amount of unbound target nucleic acid(s). The capture efficiency of a nucleic 
acid or nucleic acid population of the invention is typically compared to the capture efficiency 
of a control nucleic acid or nucleic acid population under the same incubation conditions (e.g., 
using same buffer and temperature). 

10 For example, a control nucleic acid may have P-D-2-deoxyribose instead of one or more 

bicyclic or sugar groups of a LNA unit or other modified or non-naturally-occurring units in a 
nucleic acid of the invention. In some embodiments, the nucleic acid of the invention and the 
control nucleic acid only have naturally-occurring nucleobases. If a nucleic acid of the 
invention has one or more non-naturally-occurring nucleobases, the capture efficiency of the 

15 corresponding control nucleic acid is calculated as the average capture efficiency for all of the 
nucleic acids that have either A, T, C, or G in each position corresponding to a non-naturally- 
occurring nucleobase in the nucleic acid of the invention. 

Monomers are referred to as being "complementary" if they contain nucleobases that 
can form hydrogen bonds according to Watson-Crick base-pairing rules (e.g., G with C, A with 

20 T, or A with U) or other hydrogen bonding motifs such as for example diaminopurine with T, 
inosine with C, and pseudoisocytosine with G. 

By "substantially complementarity" is meant having a sequence that is at least 60, 70, 80, 
90, 95, or 100% complementary to that of another sequence. Sequence complementarity is 
typically measured using sequence analysis software with the default parameters specified 

25 therein (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University 
of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, WI 53705). This 
software program matches similar sequences by assigning degrees of homology to various 
substitutions, deletions, and other modifications. 

The term "homology" refers to a degree of complementarity. There can be partial 

30 homology or complete homology (i.e., identity). A partially complementary sequence that at 
least partially inhibits a completely complementary sequence from hybridizing to a target 
nucleic acid is referred to using the functional term "substantially homologous." 

When used in reference to a double-stranded nucleic acid sequence such as a cDNA or 
genomic clone, the term "substantially homologous" refers to a probe that can hybridize to a 

35 strand of the double-stranded nucleic acid sequence under conditions of low stringency, e.g. 
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using a hybridization buffer comprising 20% formamide in 0.8M saline/0.08M sodium citrate 
(SSC) buffer at a temperature of 37°C and remaining bound when subject to washing once with 
that SSC buffer at 37°C 

When used in reference to a single-stranded nucleic acid sequence, the term 
5 "substantially homologous" refers to a probe that can hybridize to (i.e., is the complement of) 
the single-stranded nucleic acid template sequence under conditions of low stringency, e.g. 
using a hybridization buffer comprising 20% formamide in 0.8M saline/0.08M sodium citrate 
(SSC) buffer at a temperature of 37°C and remaining bound when subject to washing once with 
that SSC buffer at 37°C. 

10 By "internal probe" is meant a nucleic acid (e.g., a probe or primer) that hybridizes to 

either only one exon or only one intron of a nucleic acid (e.g., mRNA). The internal probe may 
hybridize to the 5* end of the exon or intron, the 3 1 end of the exon or intron, or between the 5* 
end and the 3* end of the exon or intron. Desirably, the internal probe is at least 90, 95, 96, 97, 
98, 99, or 100% identical to the corresponding region of a target nucleic acid. 

15 By "merged probe" is meant a nucleic acid (e.g., a probe or primer) that hybridizes to 

more than one exon and/or intron of a nucleic acid (e.g., mRNA). Desirably, the merged probe 
hybridizes to two consecutive exons (e.g., exons in a mature mRNA transcript that may or may 
not be consecutive in the corresponding DNA molecule). In another desirable embodiment, the 
merged probe hybridizes to an exon and the consecutive intron. In desirable embodiments, the 

20 merged probe hybridizes to the same number of nucleotides in each exon or to the same number 
of nucleotides in the exon and intron. In various embodiments, the length of the region of the 
merged probe that hybridizes to one exon differs by less than 60, 40, 20, 10, or 5% from the 
length of the region of the merged probe that hybridizes to the other exon or to the intron. 
Desirably, the merged probe is at least 90, 95, 96, 97, 98, 99, or 100% identical to the 

25 corresponding region of a target nucleic acid. 

By "poly-T20 tail " is meant an LNA polymer consisting of 20 LNA-T units often added 
as a tail to a nucleic acid sequence. 

By "mixmer" or "mixmer probe" is meant a nucleic acid (e.g., a probe or primer) that 
contains at least one LNA unit and at least one RNA or DNA unit (e.g., a naturally-occurring 

30 RNA or DNA unit). 

By "corresponding unmodified reference nucleobase" is meant a nucleobase that is not 
part of an LNA unit and is in the same orientation as the nucleobase in an LNA unit. 

By "mutation" is meant an alteration in a naturally-occurring or reference nucleic acid 
sequence, such as an insertion, deletion, frameshift mutation, silent mutation, nonsense 
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mutation, or missense mutation. Desirably, the amino acid sequence encoded by the nucleic 
acid sequence has at least one amino acid alteration from a naturally-occurring sequence. 

By "selecting" is meant substantially partitioning a molecule from other molecules in a 
population. Desirably, the partitioning provides at least a 2-fold, desirably, a 30-fold, more 
5 desirably, a 100-fold, and most desirably, a 1,000-fold enrichment of a desired molecule 
relative to undesired molecules in a population following the selection step. The selection step 
may be repeated a number of times, and different types of selection steps may be combined in a 
given approach. The population desirably contains at least 10 9 molecules, more desirably at 
least 10 11 , 10 l \ or 10 14 molecules and, most desirably, at least 10 15 molecules. 

10 By a "population" is meant more than one nucleic acid. A "population" according to the 

invention desirably means more than 10 1 , 10 2 , 10 3 , or 10 4 different molecules. 

By "photochemically active groups" is meant compounds which are able to undergo 
chemical reactions upon irradiation with light. Illustrative examples of functional groups are 
quinones, especially 6-methyl-l,4-naphtoquinone, anthraquinone, naphtoquinone, and 1,4- 

15 dimethyl-anthraquinone, diazirines, aromatic azides, benzophenones, psoralens, diazo 
compounds, and diazirino compounds. 

By "thermochemically reactive group" is meant a functional group which is able to 
undergo thermochemically-induced covalent bond formation with other groups. Dlustrative 
examples of functional parts of thermochemically reactive groups are carboxylic acids, 

20 carboxylic acid esters such as activated esters, carboxylic acid halides such as acid fluorides, 
acid chlorides, acid bromide, acid iodides, carboxylic acid azides, carboxylic acid hydrazides, 
sulfonic acids, sulfonic acid esters, sulfonic acid halides, semicarbazides, thiosemicarbazides, 
aldehydes, ketones, primary alcohols, secondary alcohols, tertiary alcohols, phenols, alkyl 
halides, thiols, disulphides, primary amines, secondary amines, tertiary amines, hydrazines, 

25 epoxides, maleimides, and boronic acid derivatives. 

By "chelating group" is meant a molecule that contains more than one binding site and 
frequently binds to another molecule, atom, or ion through more than one binding site at the 
same time. Examples of functional parts of chelating groups are iminodiacetic acid, 
nitrilotriacetic acid, ethylenediamine tetraacetic acid (EDTA), and aminophosphonic acid. 

30 By "reporter group" is meant a group which is detectable either by itself or as a part of 

an detection series. Examples of functional parts of reporter groups are biotin, digoxigenin, 
fluorescent groups (e.g., groups which are able to absorb electromagnetic radiation, e.g. light or 
X-rays, of a certain wavelength, and which subsequently reemit the energy absorbed as 
radiation of longer wavelength; such as dansyl (5-dimethylamino)-l-naphthaIenesulfonyl), 

35 DOXYL (N-oxyl-4,4-dimethyloxazolidine), PROXYL (N-oxyl-2,2,5,5-tetramethylpyrrolidine), 
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TEMPO (N^xyl-2,2,6,6-tetramethylpiperidine), dinitrophenyl, acridines, coumarins, Cy3 and 
Cy5 (trademarks for Biological Detection Systems, Inc.), erythrosine, coumaric acid, 
umbelliferone, Texas red, rhodamine, tetramethyl rhodamine, Rox, 7-nitrobenzo-2-oxa-l- 
diazole (NBD), pyrene, fluorescein, Europium, Ruthenium, Samarium, and other rare earth 
5 metals), radioisotopic labels, chemiluminescence labels (i.e., labels that are detectable via the 
emission of light during a chemical reaction), spin labels (a free radical e.g., substituted organic 
nitroxides) or other paramagnetic probes (e.g., Cu 2+ or Mg 2+ ) bound to a biological molecule 
being detectable by the use of electron spin resonance spectroscopy), enzymes (such as 
peroxidases, alkaline phosphatases, (5-galactosidases, and glycose oxidases), antigens, 

10 antibodies, haptens (e.g., groups which are able to combine with an antibody, but which cannot 
initiate an immune response by itself, such as peptides and steroid hormones), carrier systems 
for cell membrane penetration, fatty acid units, steroid moieties (cholesteryl), vitamin A, 
vitamin D, vitamin E, folic acid peptides for specific receptors, groups for mediating 
endocytose, epidermal growth factor (EGF), bradykinin, and platelet derived growth factor 

15 (PDGF). Especially desirable groups are biotin, fluorescein, Texas Red, rhodamine, 
dinitrophenyl, digoxigenin, Ruthenium, Europium, Cy5, andCy3. 

By "ligand" is meant a compound which binds. Ligands can comprise functional groups 
such as aromatic groups (such as benzene, pyridine, naphthalene, anthracene, and 
phenanthrene), heteroaromatic groups (such as thiophene, furan, tetrahydrofuran, pyridine, 

20 dioxane, and pyrimidine), carboxylic acids, carboxylic acid esters, carboxylic acid halides, 
carboxylic acid azides, carboxylic acid hydrazides, sulfonic acids, sulfonic acid esters, sulfonic 
acid halides, semicarbazides, thiosemicarbazides, aldehydes, ketones, primary alcohols, 
secondary alcohols, tertiary alcohols, phenols, alkyl halides, thiols, disulphides, primary 
amines, secondary amines, tertiary amines, hydrazines, epoxides, maleimides, Q-C20 alkyl 

25 groups optionally interrupted or terminated with one or more heteroatoms such as oxygen 
atoms, nitrogen atoms, and/or sulphur atoms, optionally containing aromatic or 
mono/polyunsaturated hydrocarbons, polyoxyethylene such as polyethylene glycol, 
oligo/polyamides such as poly-a-alanine, polyglycine, polylysine, peptides, 
oligo/polysaccharides, oligo/polyphosphates, toxins, antibiotics, cell poisons, and steroids. 

30 "Affinity ligands" include functional groups or biomolecules that have a specific affinity for 
sites on particular proteins, antibodies, poly- and oligosaccharides, and other biomolecules. 

It should be understood that the above-mentioned specific examples under DNA 
intercalators, photochemically active groups, thermochemically active groups, chelating groups, 
reporter groups, and ligands correspond to the "active/functional" part of the groups in question. 

35 For the person skilled in the art it is furthermore clear that DNA intercalators, photochemically 
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active groups, thermochemically active groups, chelating groups, reporter groups, and ligands 
are typically represented in the form M-K- where M is the "active/functional" part of the group 
in question and where K is a spacer through which the "active/functional" part is attached to the 
5- or 6-membered ring. Thus, it should be understood that the group B, in the case where B is 
5 selected from DNA intercalators, photochemically active groups, thermochemically active 
groups, chelating groups, reporter groups, and ligands, has the form M-K-, where M is the 
"active/functional" part of the DNA intercalator, photochemically active group, 
thermochemically active group, chelating group, reporter group, and ligand, respectively, and 
where K is an optional spacer comprising 1-50 atoms, desirably 1-30 atoms, in particular 1-15 

10 atoms, between the 5- or 6-membered ring and the "active/functional" part- 
By "spacer" is meant a thermochemically and photochemically non-active distance- 
making group and is used to join two or more different moieties of the types defined above. 
Spacers are selected on the basis of a variety of characteristics including their hydrophobicity, 
hydrophilicity, molecular flexibility and length (e.g., Hermanson eL ai, "Immobilized Affinity 

15 Ligand Techniques," Academic Press, San Diego, California (1992). Generally, the length of 
the spacers is less than or about 400 A, in some applications desirably less than 100 A. The 
spacer, thus, comprises a chain of carbon atoms optionally interrupted or terminated with one or 
more heteroatoms, such as oxygen atoms, nitrogen atoms, and/or sulphur atoms. Thus, the 
spacer K may comprise one or more amide, ester, amino, ether, and/or thioether functionalities, 

20 and optionally aromatic or mono/polyunsaturated hydrocarbons, polyoxyethylene such as 
polyethylene glycol, oligo/polyamides such as poly-a-alanine, polyglycine, polylysine, 
peptides, oligosaccharides, or oligo/polyphosphates. Moreover the spacer may consist of 
combined units thereof. The length of the spacer may vary, taking into consideration the 
desired or necessary positioning and spatial orientation of the "active/functional" part of the 

25 group in question in relation to the 5- or 6-membered ring. In particularl embodiments, the 
spacer includes a chemically cleavable group. Examples of such chemically cleavable groups 
include disulphide groups cleavable under reductive conditions and peptide fragments cleavable 
by peptidases. 

By "target nucleic acid" or "nucleic acid target" is meant a particular nucleic acid 
30 sequence of interest. Thus, the "target" can exist in the presence of other nucleic acid 
molecules or within a larger nucleic acid molecule. 

By "solid support*' is meant any rigid or semi-rigid material to which a nucleic acid 
binds or is directly or indirectly attached. The support can be any porous or non-porous water 
insoluble material, including without limitation, membranes, filters, chips, slides, wafers, fibers, 
35 magnetic or nonmagnetic beads, gels, tubing, strips, plates, rods, polymers, particles, 
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microparticles, capillaries, and the like. The support can have a variety of surface forms, such 
as wells, trenches, pins, channels and pores. 

By an "array" is meant a fixed pattern of at least two different immobilized nucleic 
acids on a solid support. Desirably, the array includes at least 10 2 , more desirably, at least 10 3 , 
5 and, most desirably, at least 10 4 different nucleic acids. 

By "antisense nucleic acid" is meant a nucleic acid, regardless of length, that is 
complementary to a coding strand or mRNA of interest. In some embodiments, the antisense 
molecule inhibits the expression of only one nucleic acid, and in other embodiments, the 
antisense molecule inhibits the expression of more than one nucleic acid. Desirably, the 

10 antisense nucleic acid decreases the expression or biological activity of a nucleic and or 
encoded protein by at least 20, 40, 50, 60, 70, 80, 90, 95, or 100%. An antisense molecule can 
be introduced, e.g., to an individual cell or to whole animals, for example, it may be introduced 
systemically via the bloodstream. Desirably, a region of the antisense nucleic acid or the entire 
antisense nucleic acid is at least 70, 80, 90, 95, 98, or 100% complimentary to a coding 

15 sequence, regulatory region (5' or 3' untranslated region), or an mRNA of interest. Desirably, 
the region of complementarity includes at least 5, 10, 20, 30, 50, 75,100, 200, 500, 1000, 2000 
or 5000 nucleotides or includes all of the nucleotides in the antisense nucleic acid. 

In some embodiments, the antisense molecule is less than 200, 150, 100, 75, 50, or 25 
nucleotides in length. In other embodiments, the antisense molecule is less than 50,000; 

20 10,000; 5,000; or 2,000 nucleotides in length. In certain embodiments, the antisense molecule 
is at least 200, 300, 500, 1000, or 5000 nucleotides in length. In some embodiments, the 
number of nucleotides in the antisense molecule is contained in one of the following ranges: 5- 
15 nucleotides, 16-20 nucleotides, 21-25 nucleotides, 26-35 nucleotides, 36-45 nucleotides, 46- 
60 nucleotides, 61-80 nucleotides, 81-100 nucleotides, 101-150 nucleotides, or 151-200 

25 nucleotides, inclusive. In addition, the antisense molecule may contain a sequence that is less 
than a full-length sequence or may contain a full-length sequence. 

By "double stranded nucleic acid" is meant a nucleic acid containing a region of two or 
more nucleotides that are in a double stranded conformation. In various embodiments, the 
double stranded nucleic acids consists entirely of LNA units or a mixture of LNA units, 

30 ribonucleotides, and/or deoxynucleotides. The double stranded nucleic acid may be a single 
molecule with a region of self-complimentarity such that nucleotides in one segment of the 
molecule base-pair with nucleotides in another segment of the molecule. Alternatively, the 
double stranded nucleic acid may include two different strands that have a region of 
complimentarity to each other. Desirably, the regions of complimentarity are at least 70, 80, 

35 90, 95, 98, or 100% complimentary. Desirably, the region of the double stranded nucleic acid 
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that is present in a double stranded conformation includes at least 5, 10, 20, 30, 50, 75,100, 200, 
500, 1000, 2000 or 5000 nucleotides or includes all of the nucleotides in the double stranded 
nucleic acid. Desirable double stranded nucleic acid molecules have a strand or region that is at 
least 70, 80, 90, 95, 98, or 100% identical to a coding region or a regulatory sequence (e.g., a 
5 transcription factor binding site, a promoter, or a 5' or 3' untranslated region) of a nucleic acid 
of interest. In some embodiments, the double stranded nucleic acid is less than 200, 150, 100, 
75, 50, or 25 nucleotides in length. In other embodiments, the double stranded nucleic acid is 
less than 50,000; 10,000; 5,000; or 2,000 nucleotides in length. In certain embodiments, the 
double stranded nucleic acid is at least 200, 300, 500, 1000, or 5000 nucleotides in length. In 

10 some embodiments, the number of nucleotides in the double stranded nucleic acid is contained 
in one of the following ranges: 5-15 nucleotides, 16-20 nucleotides, 21-25 nucleotides, 26-35 
nucleotides, 36-45 nucleotides, 46-60 nucleotides, 61-80 nucleotides, 81-100 nucleotides, 101- 
150 nucleotides, or 151-200 nucleotides, inclusive. In addition, the double stranded nucleic 
acid may contain a sequence that is less than a full-length sequence or may contain a full-length 

15 sequence. 

In some embodiments, the double stranded nucleic acid inhibits the expression of only 
one nucleic acid, and in other embodiments, the double stranded nucleic acid molecule inhibits 
the expression of more than one nucleic acid. Desirably, the nucleic acid decreases the 
expression or biological activity of a nucleic acid of interest or a protein encoded by a nucleic 

20 acid of interest by at least 20, 40, 50, 60, 70, 80, 90, 95, or 100%. A double stranded nucleic 
acid can be introduced, e.g., to an individual cell or to whole animals, for example, it may be 
introduced systemically via the bloodstream. 

In various embodiments, the double stranded nucleic acid or antisense molecule 
includes one or more LNA nucleotides, one or more universal bases, and/or one or more 

25 modified nucleotides in which the 2' position in the sugar (e.g., riobse or xylose) contains a 
halogen (such as flourine group) or contains an alkoxy group (such as a methoxy group) which 
• increases the half-life of the double stranded nucleic acid or antisense molecule in vitro or in 
vivo compared to the corresponding double stranded nucleic acid or antisense molecule in 
which the corresponding 2' position contains a hydrogen or an hydroxyl group. In yet other 

30 embodiments, the double stranded nucleic acid or antisense molecule includes one or more 
linkages between adjacent nucleotides other than a naturally-occurring phosphodiester linkage. 
Examples of such linkages include phosphoramide, phosphorothioate, and phosphorodithioate 
linkages. Desirably, the double strandwd or antisense molecule is purified. 

By 4e purified ,, is meant separated from other components that naturally accompany it. 

35 Typically, a factor is substantially pure when it is at least 50%, by weight, free from proteins, 
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antibodies, and naturally-occurring organic molecules with which it is naturally associated. 
Desirably, the factor is at least 75%, more desirably, at least 90%, and most desirably, at least 
99%, by weight, pure. A substantially pure factor may be obtained by chemical synthesis, 
separation of the factor from natural sources, or production of the factor in a recombinant host 
5 cell that does not naturally produce the factor. Nucleic acids and proteins may be purified by 
one skilled in the art using standard techniques such as those described by Ausubel el al. 
(Current Protocols in Molecular Biology, John Wiley & Sons, New York, 2000). The factor is 
desirably at least 2, 5, or 10 times as pure as the starting material, as measured using 
polyacrylamide gel electrophoresis, column chromatography, optical density, HPLC analysis, 

10 or western analysis (Ausubel et ai y supra). Desirable methods of purification include 
immunoprecipitation, column chromatography such as immunoaffinity chromatography, 
magnetic bead immunoaffinity purification, and panning with a plate-bound antibody. 

By "treating, stabilizing, or preventing a disease, disorder, or condition" is meant 
preventing or delaying an initial or subsequent occurrence of a disease, disorder, or condition; 

15 increasing the disease-free survival tirrie between the disappearance of a condition and its 
reoccurrence; stabilizing or reducing an adverse symptom associated with a condition; or 
inhibiting or stabilizing the progression of a condition. Desirably, at least 20, 40, 60, 80, 90, or 
95% of the treated subjects have a complete remission in which all evidence of the disease 
disappears. In another desirable embodiment, the length of time a patient survives after being 

20 diagnosed with a condition and treated with a nucleic acid of the invention is at least 20, 40, 60, 
80, 100, 200, or even 500% greater than (i) the average amount of time an untreated patient 
survives or (ii) the average amount of time a patient treated with another therapy survives. 

By "treating, stabilizing, or preventing cancer" is meant causing a reduction in the size 
of a tumor, slowing or preventing an increase in the size of a tumor, increasing the disease-free 

25 survival time between the disappearance of a tumor and its reappearance, preventing an initial 
or subsequent occurrence of a tumor, or reducing an adverse symptom associated with a tumor. 
In one desirable embodiment, the number of cancerous cells surviving the treatment is at least 
20, 40, 60, 80, or 100% lower than the initial number of cancerous cells, as measured using any 
standard assay. Desirably, the decrease in the number of cancerous cells induced by 

30 administration of a nucleic acid of the invention (e.g., a nucleic acid with substantial 
complementarily to a nucleic acid associated with cancer such as an oncogne) is at least 2, 5, 
10, 20, or 50-fold greater than the decrease in the number of non-cancerous cells. In yet 
another desirable embodiment, the number of cancerous cells present after administration of a 
nucleic acid of the invention is at least 2, 5, 10, 20, or 50- fold lower than the number of 

35 cancerous cells present prior to the administration of the compound or after administration of a 
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buffer control. Desirably, the methods of the present invention result in a decrease of 20, 40, 
60, 80, or 100% in the size of a tumor as determined using standard methods. Desirably, at 
least 20, 40, 60, 80, 90, or 95% of the treated subjects have a complete remission in which all 
evidence of the cancer disappears. Desirably, the cancer does not reappear or reappears after at 
5 least 5, 10, 15, or 20 years. 

Exemplary cancers that can be treated, stabilized, or prevented using the above methods 
include prostate cancers, breast cancers, ovarian cancers, pancreatic cancers, gastric cancers, 
bladder cancers, salivary gland carcinomas, gastrointestinal cancers, lung cancers, colon 
cancers, melanomas, brain tumors, leukemias, lymphomas, and carcinomas. Benign tumors 

10 may also be treated or prevented using the methods and nucleic acids of the present invention. 

By "infection" is meant the invasion of a host animal by a pathogen (e.g., a bacteria, 
yeast, or virus). For example, the infection may include the excessive growth of a pathogen 
that is normally present in or on the body of an animal or growth of a pathogen that is not 
normally present in or on the animal. More generally, aninfection can be any situation in which 

15 the presence of a pathogen population(s) is damaging to a host. Thus, an animal is "suffering" 
from an infection when an excessive amount of a pathogen population is present in or on the 
animal's body, or when the presence of a pathogen population(s) is damaging the cells or other 
tissue of the animal. In one embodiment, the number of a particular genus or species of 
paghogen is at least 2, 4, 6, or 8 times the number normally found in the animal. 

20 At bacterial infection may be due to gram positive and/or gram negative bacteria. In 

desirable embodiments, the bacterial infection is due to one or more of the following bacteria: 
Chlamydophila pneumoniae, C. psittaci, C. abortus, Chlamydia trachomatis, Simkania 
negevensis, Parachlamydia acanthamoebae, Pseudomonas aeruginosa, P. alcaligenes, P. 
chlororaphis, P. fluoresces, P. luteola, P. mendocina, P. monteilii, P. oryzihabitans, P. 

25 pertocinogena, P. pseudalcaligenes, P. putida, P. stutzeri, Burkholderia cepacia, Aeromonas 
hydrophilia, Escherichia coli, Citrobacter freundii, Salmonella typhimurium, S. typhi, S. 
paratyphi, S. enteritidis, Shigella dysenteriae, S. flexneri, S. sonnei, Enterobacter cloacae, E. 
aerogenes, Klebsiella pneumoniae, K. oxytoca, Serratia marcescens, Francisella tularensis, 
Morganella morganii, Proteus mirabilis, Proteus vulgaris, Providencia alcalifaciens, P. 

30 rettgeri, P. stuartii, Acinetobacter calcoaceticus, A. haemolyticus, Yersinia enter ocolitica, Y. 
pestis, K pseudotuberculosis, Y. intermedia, Bordetella pertussis, B. parapertussis, B. 
bronchiseptica, Haemophilus influenzae, H. parainfluenzae, H. haemolyticus, H. 
parahaemolyticus, H. ducreyi, Pasteurella multocida, P. haemolytica, Branhamella catarrhalis, 
Helicobacter pylori, Campylobacter fetus, C jejuni, C. coli, Borrelia burgdorferi, V. cholerae, 

35 V. parahaemolyticus, Legionella pneumophila, Listeria monocytogenes, Neisseria gonorrhea, 
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N. meningitidis, Kingella dentrificans, K. kingae, K. oralis, Moraxella catarrhalis, M. atlanrae, 
Af. lacuna ta, M. nonliquefaciens, M. osloensis, M. phenylpyruvica, Gardnerella vaginalis, 
Bacteroides fragilis, Bacteroides distasonis, Bacteroides 3452A homology group, Bacteroides 
vulgatus, B. ovalus, B. thetaiotaomicron, B. uniformis, B. eggerthii, B. splanchnicus, 
5 Clostridium difficile, Mycobacterium tuberculosis, Af. avium, M. intracellular, M. leprae, C. 
diphtheriae, C ulcerans, C. accolens, C. afermentans, C. amycolatum t C. argentorense, C. 
auris, C. bovis, C confusum, C. coyleae, C. durum, C falsenii, C. glucuronolyticum, C. 
imitans, C. jeikeium, C. kutscheri, C. kroppenstedtii, C. lipophilum, C. macginleyi, C. 
matruchoti, G mucifaciens, C. pilosum, C propinquum, C renale, C. riegelii, C. sanguinis, C. 

10 singulare, C striatum, C. sundsvallense, C thomssenii, C. urealyticum, C. xerosis , 
Streptococcus pneumoniae, S. agalactiae, S. pyogenes, Enterococcus avium, E. casseliflavus, E. 
cecorutn, E. dispar, E. durans, E,faecalis % E,faecium, £. flavescens, E. gallinarum, E. hirae, E. 
malodoratus, E. mundtii, E. pseudoavium, E. raffinosus, E. solitarius, Staphylococcus aureus, 
5. epidermidis, 5. saprophytics, S. intermedius, S. hyicus, S. haemolyticus, S. hominis, and/or 

15 5. saccharolyticus. Desirably, a nucleic acid is administered in an amount sufficient to prevent, 
stabilize, or inhibit the growth of a pathogenic bacteria or to kill the bacteria. 

In various embodiments, the viral infection relevant to the methods of the invention is 
an infection by one or more of the following viruses: West Nile virus (e.g., Samuel, "Host 
genetic variability and West Nile virus susceptibility," Proc. Natl. Acad, Sci. USA August 21, 

20 2002; Beasley, Virology 296:17-23, 2002), Hepatitis, picornarirus, polio, HIV, coxsacchie, 
herpes (e.g., zoster, simplex, EBV, or CMV), adenovirus, retrovius, falvi, pox, rhabdovirus, 
picorna virus (e.g., coxsachie, entero, hoof and mouth, polio, or rhinovirus), St. Louis 
encephalitis, Epstein-Barr, myxovirus, JC, coxsakievirus B, togavirus, measles, paramyxovirus, 
echovirus, bunyavirus, cytomegalovirus, varicella-zoster, mumps, equine encephalitis, 

25 lymphocytic choriomeningitis, rabies, simian virus 40, polyoma virus, parvovirus, papilloma 
virus, primate adenovirus, and/or BK. 

By ' mammal in need of treatment" is meant a mammal in which a disease, disorder, or 
condition is treated, stabilized, or prevented by the administration of a nucleic acid of the 
invention. 

30 Other aspects and embodiments of the invention are in the detailed description and 

claims below. Additionally, other nucleic acids and methods described in U.S.S.N. 10/105,639 
(Jakobsen et al, "Modified Oligonucleotides and Uses Thereor) or U.S.S.N. 60/410,061 
(Ramsing et al. % "Populations of Oligonucleotides with Duplex Stabilizing Properties and Uses 
Thereof) which are hereby incorporated by reference, can be used in the present invention. [ 

35 


LN A2 l/SKA/MSL 


5/14/2003 


57 


Brief Description Of The Drawing s 

The application file contains drawings executed in color (Figs. 14, 15, 
16A, 16B, 17, 18A, 18B, 22, 27, 29, 32, 34, 35A, 35B, 36A, 36B, and 37). 
5 Copies of this patent or patent application with color drawings will be provided 
by the Office upon request and payment of the necessary fee. 

Figure 1 shows the structures of selected nucleotide monomers: DNA (T), LNA (T^), 
pyrene DNA (Py), 2'-OMe-RNA [2'-OMe(T)] f abasic LNA (Ab L ), phenyl LNA (17a), and 
pyrenyl LNA (17d). 

10 Figure 2 illustrates the chemical structures of Selective Binding Complementary (SBC) 

nucleotides. 

Figure 3 is a table comparing the T m differential of various DNA oligomers to LNA 
oligomers. 

Figure 4 shows the base-pairing between modified bases and naturally-occurring 
15 nucleotides (EP 1 072 679). These modified bases may be incorporated as part of an LNA, 
DNA, or RNA unit and used any of the oligomers of the invention. 

Figure 5 shows the structure of desirable adenosine analogs (WO 97/12896). These 
modified bases may be incorporated as part of an LNA, DNA, or RNA unit and used any of the 
oligomers of the invention. 
20 Figure 6 shows the structure of desirable thymine analogs (WO 97/12896). These 

modified bases may be incorporated as part of an LNA, DNA, or RNA unit and used any of the 
oligomers of the invention. 

Figure 7 shows the structure of desirable guanine analogs (WO 97/12896). These 
modified bases may be incorporated as part of an LNA, DNA, or RNA unit and used any of the 
25 oligomers of the invention. 

Figure 8 shows the structure of desirable cytosine analogs (WO 97/12896). These 
modified bases may be incorporated as part of an LNA, DNA, or RNA unit and used in any of 
the oligomers of the invention. 

Figure 9 is a schematic illustration of the use of an exemplary synthesis for LNA- 
30 furanoPyr-SBC-C. 

Figure 10 is a schematic illustration of the use of nucleic acid probes with secondary 
structure (e.g., hairpins formed due to flanking universal base). The nucleic acid probe contains 
both a fluorophore and a quencher such that the fluorescence intensity of the fluorophore is 
decreased by a nearby quencher. When a test nucleic acid is bound to the capture probe, the 
35 level of fluorescence intensity increases because of the increased distance between the 
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fluorophore and quencher. This method allows a test nucleic acid sample to be characterized 
and quantitated without being labeled directly. The amount of capture probe in a particular spot 
in the array can also be measured by heating the array to reduce secondary structure (e.g., 
eliminate hairpins) and measuring the fluorescence. 
5 Figure 11 is a schematic illustration of the synthesis of some compounds of the 

invention. 

Figure 12A and 12 B show the sensitivity of 50-mer LNA capture probes compared to 
50-mer DNA capture probes. SW15-specific 50-mer DNA oligonucleotides (green bars) and 
50-mer capture probes with an LNA nucleotide incorporated at every third nucleotide position 
10 (red bars) were printed at the oligo concentration indicated below. The slides were hybridized 
at 65°C in 3x SSC (Fig. 12A) and at 70°C in 3xSSC (Fig. 12B). 

Figures 13A and 13B show the specificity of 40-mer LNA capture probes (red bars) 
compared to DNA capture probes (green bars). The hybridizations were carried out at 65°C in 
3XSSC. Bars 1 and 7 represent perfectly matched duplexes, bars 2 and 8, 3, and 9, 4 and 10, 5 
15 and 11, and 6 and 12 represent duplexes with 1, 2, 3, 4, 5 mismatches, respectively. The in 
vitro RNA used was SW15 in Fig 13A and TH14 in Fig. 13B. 

Fig. 14 shows the detection principle for alternative exon skipping in the C. elegans let- 
2 gene using LNA oligonucleoitde capture probes and comparative expression profiling. 

Figure 15 shows the detection of alternative splicing of C. elegans Let-2 exon 9 and 10 
20 using LNA-modified capture probes. 

Figures 16A and 16B show the comparison of DNA and LNA-modified oligonucleotide 
capture probes in the specific capture of the C. elegans gene26/T01D3.3 exon 4. 

Figure 17 illustrates the LNA exon-exon junction (merged) probe concept. 
Figures 18A and 18B show the capture probe specificity for the C. elegans 
25 gene26/T01D3.3 exon 4 (Fig. 18A) and exon 5 (Fig. 18B) as validated by short complementary 
target oligonucleotides. 

Figure 19 shows the construction of the recombinant splice variants in the in vitro 
transcription vector. The small bars show the location of the hybridization for the 
oligonucleotide capture probes used in this example. The sequences of the capture probes are 
30 described herein. 

Figures 20A and 20B or bar graphs show the detection of splice variant #1 and #2, 
respectively using LNA-modified capture probes in a comparative hybridization. 

Fig. 21 shows the sensitivity of 50-mer LNA capture probes compared to 50-mer DNA 
capture probes. SWI5-specific 50-mer DNA oligonucleotides and 50-mer capture probes with 
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an LNA nucleotide incorporated at every second (LNA2) or third (LNA3) nucleotide position. 
The slides were hybridized at 65°C in 3xSSC. 

Figure 22 is a bar graph of the signal intensities of a patient DNA sample hybridized to 
an array of the invention. The names of the probes in Figs. 22 and 25 match, although the 
5 numbers used in Fig. 22 are abbreviated, e.g., probe No. 10580 Menkes. 14 50NH2C6-2.LNA in 
Fig. 25 corresponds to the second probe counted from the left "14.2" LNA in the lower graph of 
Fig. 22. 

Figure 23 is a graph comparing the spot intensity for probes of the invention with 
different LNA substitution patterns. 
10 Figure 24 is a bar graph of the spot intensity for LNA probes for different exons. 

Figure 25 is a table of CGH capture probe sequences. 

Figure 26 is a flow chart of the steps of oligo design software of the invention. The 
OligoDesign software features LNA modified oligonucleotide secondary structure prediction, 
LNA spiked oligo melting temperature prediction, genome wide cross hybridization prediction, 
15 secondary structure prediction of the target, and recognition and filtering of the target in the 
genome. These features are determined for each possible probe of the query gene and presented 
to an artificial neural network. The probes are then ranked according to the neural network 
prediction and the top scoring probes are returned. 

Figure 27 is a schematic illustration of oligo design software of the invention. 
20 Figures 28A-28E contain computer code for exemplary software of the invention. 

The oligod program takes a gene sequence as input and returns sequences for LNA spiked 
oligonucleotides (Fig. 28A). The dyp program is used by oligod to predict the secondary 
structure and self annealing properties of the oligonucleotides (Fig. 28B). The 
expression_array_param file contains parameters used by the oligod program (Fig. 28C). Fig. 
25 28D contains code for a T m prediction program, and Fig. 28E contains code for a T m 
thermodynamic model. 

Figure 29 illustrates photo-activated immobilization of nucleic acids of the invention, 
which enables polarized coupling of anthraquinone (AQ)-lined LNA oligonucleotides onto the 
polymer surface. No pretreatment of the slide is needed. A covalent bond is formed between 
30 the oligonucleotide and the polymer using a UV source, e.g. Stratalinker. 

Figure 30 illustrates an injection-molded polymer slide. Finger indents ease slide 
handling. The slide has a well-defined printing and hybridization window, frosted surface for 
identification and orientation, and space for barcodes. 

Figure 31 illustrates spot quality on different slides that can be used to immobilize 
35 nucleic acids of the invention. The hydrophobic slide surface ensures that extremely 
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homogenous spots are generated when hydrophilic spotting solution is applied to the surface. A 
high spot quality is obtained on the EURAY™ polymer slide compared to a glass slide when 
using a spot-to-spot distance of 150 :M. The high-quality arrays simplify downstream image 
analyses. 

5 Figure 32 is a schematic illustration of a method of the invention. 

Figure 33 is a table of exemplary target nucleic acids (Holstege et ah (1998)( Cell 95, 
717-728, and Causton et al. (2001) Mol. Biol. Cell 12, 323-337). 

Figure 34 is a graph of Cy5 intensity. Yeast actin 1 -specific 50-mer capture probes 
were synthesized as DNA and DNA/LNA mixmer oligonucleotides. LNA mixmer capture 
10 probes contain an LNA at every 4 ,h , 5 th , and 6 ,h nucleotide position (LNA_4, LNA_5, LNA_6). 
On-chip melting profiles demonstrate a 8-10 °C increase in T m obtained with LNA capture 
probes. 

Figure 35A illustrates the heat-shock response in yeast. The array was hybridized with 
Cy3-labeled standard and Cy5-labelled heat-shock yeast cDNA. Figure 35B also illustrates the 
15 heat-shock response in yeast. The microarray data were normalized using yeast actin 1. The 
ssa4 gene encoding heat shock protein HSP70 is up-regulated over 2-fold. Expression of the 
gual gene is down-regulated. 

Figure 36A compares expression of wild-type and ssa4 mutant yeast. The array was 
hybridized with Cy3-labeled wild-type and Cy5-labelled ssa4 mutant yeast cDNA. Figure 36B 
20 also compares wild-type and ssa4 yeast. The hybridization data were normalized using yeast 
actin 1. ssa4 is detected in the wild-type yeast strain, but not in the ssa4 knock-out strain. 

Figure 37 illustrates mRNA splicing. 

Figure 38 is a picture showing gel electrophoresis of fragmented cDNA from the yeast 
wild-type strain. The molecular marker (lane 1 and 9) is from Life technologies, USA. Lanes 
25 2-8 represents the UDG-fragmented cDNA 1-7 according to the different dUTP/dTTP ratios in 
Table 18. 

Figure 39 is a graph of the log ratios of the normalized fluorescence intensities from the 
wild-type yeast strain (signal) and those from the Assa4 yeast strain (noise) as a function of 
capture probe position in the 3* region of the SSA4 mRNA. 
30 Figure 40 is a schematic illustration of mRNA splicing. 

Figure 41 is a schematic illustration of alternative mRNA splicing. 

Figure 42 is a schematic illustration of probes of the invention. 

Figure 43 is a schematic illustration of probes of the invention. 

Figure 44 illustrates an exemplary computer for use in the methods of the invention. 
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Detailed Description 

Detection and Analysis of mRNA Splice Variants 

Alternative splicing is the process by which different mature messenger RNAs are 
5 produced from the same pre-mRNA. Because the mRNA composition of a given cell determine 
the proteins present in a cell, this process is an important aspect of a cells gene expression 
profile. Current investigations of transcriptomes (i.e., the total complexity of RNA transcripts 
produced by an organism) indicate that at least 50-60 % of the genes of complex eukaryotes 
produce more than one splice variant. The present invention provides a novel method for 

10 detecting and quantifying the levels of splice variants in complex mRNA pools using LNA 
discriminating probes and high-throughput LNA oligonucleotide microarray technology. The 
detection concept which uses internal LNA exon probes and/or splice- van ant specific exon- 
exon junction or exon-intron or intron-exon (so-called merged) probes is depicted in Fig. 17. 

Internal, exon-specific (or intron-specific) LNA oligonucleotide probes are designed and 

15 used to detect the relative levels of a given exon (or intron) in complex mRNA pools using 
oligonucleotide microarray technology or similar techniques. Exon-exon LNA junction probes 
are designed for multiple or all possible exon-exon combinations or exon-intron combinations. 
The LNA discriminating probes are highly specific and superior compared to DNA oligos due 
to the higher ATm of LNA probes. These probes can be used to determine the sequential order 

20 of each sub-element (i.e., exon structure or exon-intron structure) in a given alternatively 
spliced mRNA isoform, thus giving the exact composition of the mRNA. Subsequently, the 
ratios of each splice variant can be quantified using the combined readouts from both internal 
and merged LNA probes and control probes. The invention is applicable both in single fluor 
(single channel) or comparative two-fluor (two channel) microarray hybridizations. 

25 Several "artificial,"aliematively spliced mRNA molecules may be constructed in an in 

vitro transcription vector for the production of clean IVT RNA. Both internal and junction- 
specific LNA oligonucleotide capture probes are designed, synthesized, and spotted onto, e.g., 
Exiqon's polymer microarray platform. The resulting splice-specific microarray are used to 
validate the LNA discriminating probe concept by spiking the in vitro RNAs individually as 

30 well as in different ratios into a complex RNA background for fluorochrome-labelling and array 
hybridization. 

The internal and merged probes of the invention can also be used in any standard 
method for the analysis of mRNA splice variants (see, for example, Yeakley et aL 9 Nature 
Biotechnology 20:353-358, 2002; Clark et a/., Science 296:907-910, 2002; Mutch et a/., 
35 Genome Biology 2(12):preprint00009.1-O0O9.31, 2001). 
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Exemplary Applications of Internal and/or Merged Probes 

The internal and/or merged probes of the invention can also be used for gene expression 
profiling of alternative splice variants, oligonucleotide expression microarrays, real-time PCR, 
5 and profiling of alternatively spliced mRNAs using microtiterplate assays or fiber-optic arrays. 

Detection and characterization of alternative splicing is particularly useful for the study 
and treatment of human disease (exonhit website, "Inaugural Splicing 2002 Concludes: 
Alternative Splicing May Make All the Difference in Discovering the Origin of Disease). In 
particular, RNA splicing is now widely recognized as a means to generate protein diversity. 

10 Alternative splicing is a key mechanism for regulating gene expression, and any mutation or 
defect in its regulation can impact considerably cell functions. Therefore, it is likely to be an 
important source of novel gene and protein targets implicated in human pathology. Industry 
has long recognized the need for innovative discovery technologies that focuses on the origin of 
disease for the development of novel diagnostics and therapeutics. 

1 5 In particular, there are many examples of human pathologies caused by alterations in 

normal patterns of alternative RNA splicing. Because a large number of human genes undergo 
alternative splicing, the protein isoforms that result from this process represent a major source 
of targets for commercial development of therapies and diagnostics. In particular, splicing 
processes play a significant role in the onset and development of cardiovascular, muscular, 

20 CNS diseases, and cancer. Early evidence indicates the origin of many diseases can be 
identified by examining alternative splicing - which leads to the point of intervention for 
discovering future generations of drugs. The study of splicing enables the discovery of new 
mechanisms underlying disease progression. 
Comparative Genomic Hybridization 

25 Comparative Genomic Hybridization (CGH) is a powerful technology for detection of 

unbalanced chromosome rearrangements and holds much promise for screening and 
identification of interstitial submicroscopic rearrangements that otherwise cannot be detected 
using classical cytogenetic or FISH technologies. The adaptation of CGH onto an oligo 
microarray platform allows detection of small single exon deletion/duplications on a genome 

30 wide scale. There is a strong need for developing microarrays that can detect, e.g., single exon 
aberrations. This detection can be achieved by employing LNA mixmer oligos as capture 
probes for individual exons in selected genes. 

A model system for these methods is the Menkes loci. Menkes disease is a lethal -X linked 
recessive disorder associated with copper metabolism disturbance leading to death in early 

35 childhood. The Menkes locus has been mapped to Xql3. The gene spans about 150 kb 
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genomic region, contains 23 exons, and encodes a 8.5 kb gene transcript. The gene for Menkes 
disease (now designated as ATP7A) encodes a 1500 amino acid membrane-bound Cu-binding 
P-type ATPase (ATP7A). The 8.5 kb transcript is expressed in all tissues from normal 
individuals (though only trace amounts are present in liver), but is diminished or absent in 
5 Menkes disease patients. Several different kinds of mutations, like chromosome aberrations, 
point mutations and partial gene deletions affecting ATP7A have been identified in MD 
patients. 50-mer capture probes with LNA spiked in every second, third, and fourth position 
have been designed for every exon (23 exons) representing ATP7A % using the OligoDesign 
software tool, described herein. The C6-amino-linked capture probes were spotted onto 

10 Immobilizer slides and hybridized with patient samples with Cy3 fluorescent dye and a known 
reference genomic sample with Cy5. After mixing equal amounts of the labelled DNA, the 
probe is hybridized it to array. The ratio of Cy5 signal to Cy3 for each clone indicates 
differences in chromosome/DNA material. For example, the Cy5 signal is higher than Cy3 if 
the patient genome has a deletion, and is lower if there is duplication. In regions that are 

15 unchanged, the Cy5:Cy3 ratio is 1:1. These methods can be used to analyze a number of well- 
characterized Menkes patients with a range of partial deletions of ATP7A. 

LNA oligonucleotide-based CGH makes it possible to assess a large number of 
chromosomal aberrations that are being screen for in the cytogenetic clinic. In contrast, 
standard FISH analysis typically only detects large chromosomal rearrangements. In desirable 

20 embodiments, an array that contains a series of overlapping probes is used to detect a 
chromosomal deletion in a nucleic acid sample, such as a patient sample. 
Clinical diagnostics 

Clinical diagnosis is a key element in healthcare management and point-of-care. A large 
number of analyses in the hospitals are based on the use of robust, cost efficient, sensitive and 

25 highly specific diagnostic tests. Thus, the diagnosis of various diseases is performed with a high 
selectivity and reliability, resulting in confirmation of medical diagnosis, choice of therapy and 
follow-up treatment as well as prevention. In addition to its importance in the quality of 
healthcare provided to patients, clinical diagnosis also contributes to the control of healthcare 
costs. The field of clinical diagnostics involves analyzing biological fluid samples (blood, urine, 

30 etc.) or biopsies collected from patients in order to establish the diagnosis of diseases, whether 
of infectious, metabolic, endocrine or cancerous origin. Medical analysis of infectious diseases 
involves testing and identifying the micro-organisms causing the infection e.g. testing for and 
identifying a micro-organism in blood and determining its susceptibility to antibiotics or 
detecting an antigen-antibody reaction produced as a response to an attack by a micro-organism 

35 in the human body, e.g. testing for antibodies for the diagnosis of hepatitis. The accurate 
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diagnosis of metabolic and endocrine diseases and cancers, resulting in a disease phenotype 
with a bodily imbalance, involves the measurement of diagnostic substances or elements 
present in the biological fluids or biopsies. These substances are examined and results are 
interpreted with reference to known normal values. 
5 Use of diagnostic kits in microbiological control 
The pharmaceutical, cosmetics and agri-food industries are being confronted with increasingly 
strict quality standards. Thus, the purpose of industrial microbiological control testing is to 
detect and measure the presence of potentially pathogenic microbial contaminants throughout 
the manufacturing process from raw materials to the finished products, as well as in the 
10 production environment. The obtained results are subsequently compared to the current 
regulatory guidelines and industry standards. 

Application of molecular biological techniques to in vitro diagnostics 

Recently, several different molecular biological techniques have been used successfully in 
accurate quantification of RNA levels in clinical diagnosis as well as in microbiological control. 

15 The applications are wide-ranging and include methods for quantification of the regulation and 
expression of drug resistance markers in tumour cells, monitoring of the responses to 
chemotherapy, measuring the biodistribution and transcription of gene-encoded therapeutics, 
molecular assessment of the tumor stage in a given cancer, detecting circulating tumor cells in 
cancer patients and detection of bacterial and viral pathogens. The reverse transcription 

20 polymerase chain reaction (RT-PCR) is the most sensitive method for the detection of mRNA, 
including low abundant mRNAs, often obtained from limited tissue samples in clinical 
diagnostics. The application of fluorescence techniques to RT-PCR combined with suitable 
instrumentation has led to development of quantitative RT-PCR methods, combining 
amplification, detection and quantification in a closed system avoid from contamination and 

25 with minimized hans-on time. The two most commonly used quantitative RT-PCR techniques 
are the Taqman RT-PCR assay (ABI, Foster City, USA) and the Lightcycler assay (Roche, 
USA). A third method applied to detection and quantification of RNA levels is real-time 
nucleic acid sequence based amplification (NASBA) combined with molecular beacon 
detection molecules. NASBA is a singe-step isothermal RNA-specific amplification method 

30 that amplifies mRNA in a double stranded DNA environment, and this method has recently 
proven useful in the detection of various mRNAs and in the detection of both viral and bacterial 
RNA in clinical samples. Finally, the recent explosion in microarray technology holds the 
promise of using microarrays in clinical diagnostics. For example van't Veer et al. (Nature 
2002: 415, 31) describe the successful use of microarrays in obtaining digital mRNA signatures 
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from breast tumors and the use of these signatures in the precise prediction of the clinical 
outcome of breast cancer in patients. 

The success of exploiting molecular biological techniques in diagnostics and diagnostic kits 
depends on continuous optimization of the technologies and the development of new robust and 
5 cost-effective technology platforms for producing accurate, reproducible and valid clinical data. 
Locked nucleic acid (LNA) oligonucleotides constitute a novel class of bicyclic RNA analogs 
having an exceptionally high affinity and specificity toward their complementary DNA and 
RNA target molecules. Besides increased thermal stability, LNA-containing oligonucleotides 
show significantly increased mismatch discrimination, and allow full control of the melting 

10 temperature across microarray hybridizations. The LNA chemistry is completely compatible 
with conventional DNA phosphoramidite chemistry and thus LNA substituted oligonucleotides 
can be designed to optimize performance. LNA oligonucleotides would be well-suited for 
large-scale clinical studies providing highly accurate genotyping by direct competitive 
hybridization of two allele-specific LNA probes to e.g. microarrays of immobilized patient 

15 amplicons. In addition, the use of LNA substituted oligonucleotides would increase both 
sensitivity and specificity in detection and quantification of mRNA levels in clinicals samples, 
either by quantitative RT-PCR, quantitative NASBA or oligonucleotide microarrays, compared 
with DNA probes. Application of LNA oligonucleotides into diagnostic kits would thus 
significantly enhance their performance. Finally, the use of LNA substituted oligonucleotides 

20 would increase the sensititity and specificity in the detection of alternatively spliced mRNA 
isoforms and non-coding RNAs either by homogeneous assays (Taqman assay, Lightcycler 
assay, NASBA) or by oligonucleotide microarrays in a massive parallel analysis setup. 
Optimized Nucleic Acids of the Invention 

Decreasing the variation in melting temperatures (T m ) of a population of nucleic acids 

25 allows the nucleic acids to hybridize to target molecules under similar binding conditions, 
thereby simplifying the simultaneous hybridization of multiple nucleic acids. Similar melting 
temperatures also allow the same hybridization conditions to be used for multiple experiments, 
which is particularly useful for assays involving hybridization to nucleic acids of varying "AT" 
content. For example, current methods often require less stringent conditions for hybridization 

30 of nucleic acids with high "AT" content compared to nucleic acids with low "AT' content. 
Due to this variation in hybridization stringency, current methods may require significant trial 
and error to optimize the hybridization conditions for each experiment. 

To overcome limitations in current nucleic acid hybridization and/or amplification 
techniques, we have developed populations of nucleic acid probes or primers with minimal 

35 variation in melting temperature (U.S.S.N. 60/410,061). For example, the unique properties of 
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LNA nucleotide analogs increase their binding affinity for DNA and RNA. The stability of 
duplexes can generally be ranked as follows: DNA:DNA < DNA:RNA < RNA: RNA < 
LNAtDNA < LNA: RNA < LNA: LNA. The DNA:DNA duplex is thus the least stable and the 
LNA:LNA duplex the most stable. The affinity of the LNA nucleotides A and T corresponds 
5 approximately to the affinity of DNA G and C to their complementary bases. General 
substitution of one or more A and T nucleotides with LNA A and LNA T in DNA 
oligonucleotides is therefore a simple way of equalizing differences in T m . Furthermore, the 
mean melting temperature is increased significantly, which is often important for shorter 
oligonucleotides. For example, predictions of melting temperature of all possible 9-mer 

10 oligonucleotides have shown that the mean temperature increases from 39.7 °C to 59.3 °C by 
substituting all DNA A and T nucleotides with LNA A and T nucleotides. The variance in 1 m 
of all 9-mers furthermore decreases from 59.6 °C for DNA oligonucleotides to only 4.7 °C for 
the LNA substituted oligonucleotides. The estimations are based on the latest LNA T m 
prediction algorithms such as those disclosed herein, which have a variance of 6-7 °C. 

15 If desired, the capture efficiency of one or more nucleic acids can be increased by 

including any of the high affinity nucleotides (e.g., LNA units) described herein within the 
nucleic acids. The examples herein also provide algorithms for optimizing the substitution 
patterns of the nucleic acids to minimize self-complementarity that may otherwise inhibit the 
binding of the nucleic acids to target molecules. 

20 For various applications of the nucleic acids and arrays of the invention, LNA A and 

LNA T substitutions are made to equalize the' melting temperatures of the nucleic acids. In 
other embodiments, LNA A and LNA C substitutions are made to minimize self- 
complementarity and to increase specificity. LNA C and LNA T substitutions also minimize 
self-complementarity. Additionally, oligonucleotides containing LNA C and LNA T are 

25 desirable because these modified nucleotides are easy to synthesis and are especially useful for 
applications such as antisense technology in which minimizing cost is especially desirable. 

The following non-limiting examples are illustrative of the invention. All documents 
mentioned herein are incorporated herein by reference in their entirety. In the following 
Examples, compound reference numbers designate the compound as shown in Scheme 1 and 2 

30 herein. 

Example 1: The Use of LNA-modified Oligonucleotides in Microarravs Provide Significantly 
Improved Sensitivity and Specificity in Expression Profiling 

This example demonstrates the advantages of using LNA oligonucleotide microarrays in 
35 gene expression profiling experiments. Capture probes for the Saccharomyces cerevisiae genes 
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SWI5 (YDR146C) and THI4 (YGR144W) were designed as 50-mer standard DNA and 
different LNA/DNA "mixmer" oligonucleotides (i.e., oligonucleotides containing both LNA 
and DNA nucleotides) respectively, for comparison (Table 2). In addition, 40-mer 
oligonucleotides were designed as truncated versions of the 50-mer capture probes (Table 2). 
5 The specificity of the LNA otigoarrays was addressed by introducing 1-5 consecutive 
mismatches positioned in the middle of 40-mer LNA/DNA mixmer capture probes with LNA in 
every fourth position. To assess the sensitivity of DNA versus LNA capture probes in complex 
hybridization mixtures, in vitro synthesized yeast RNA for either SWI5 or THI4 was spiked into 
Caenorhabditis elegans total RNA for cDNA target synthesis. These experiments are described 

10 further below. 

Cultivation of Caenorhabditis elegans worms 

Mixed stage C. elegans cultures were grown according to standard methods. Samples 
were harvested by centrifugation at 3,000xg, suspended in RNA Later storage buffer (Ambion, 
USA), and immediately frozen in liquid nitrogen. 

15 RNA extraction 

RNA was extracted from the worm samples using the FastRNA® Kit, GREEN (Q-BIO) 
essentially according to the suppliers' instructions. 
In vitro RNA synthesis 

Amplification of the yeast genes was performed using standard PCR with yeast genomic 

20 DNA as the template. In the first step, a forward primer containing a restriction enzyme site 
and a reverse primer containing a universal linker sequence were used. In the second PCR 
reaction, the reverse primer was exchanged with a nested primer containing a poly-T20 tail and 
a restriction enzyme site. The DNA fragments were ligated into the pTRIampl8 vector 
(Ambion, USA) using the Quick Ligation Kit (New England Biolabs, USA) according to the 

25 supplier's instructions and transformed into E. coll DH-5a by standard methods. The PCR 
clones were sequenced using Ml 3 forward and M13 reverse primers on an ABI 377 (Applied 
Biosystems, USA). Synthesis of in vitro RNA was carried out using the MEGAscript™ T7 Kit 
(Ambion, USA) according to the manufacturer's instructions. 
Design and synthesis of the LNA capture probes 

30 To design the capture probes, regions with unique mRNA sequence of the selected 

target genes were identified. The optimal 50-mer oligonucleotide sequences with respect to T m , 
self-complementarity, and secondary structure were selected. LNA modifications were 
incorporated to increase affinity and specificity. 
Printing of the LNA microarrays 
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The microarrays were printed on Immobilizer™ MicroArray Slides (Exiqon, Denmark) 
using the Biochip One Arrayer from Packard Biochip technologies (Packard, USA). The arrays 
were printed with a spot volume of 2 x 300 pi of a 10 \lM capture probe solution. Four replicas 
of the capture probes were printed on each slide. 
5 Synthesis offluorochrome labelled first strand cDNA from total RNA 

Ten ng of S. cerevisiae in vitro synthesized RNA (either SWI5 or THI4) was combined 
with 10 fig of C. elegans total RNA and 5 Jig oligo dT primer (T20VN) in an RNase free, pre- 
siliconized 1.5 mL tube, and the final volume was adjusted with DEPC-water to 8 |iL. The 
reaction mixture was heated at +70°C for 10 minutes, quenched on ice for 5 minutes, and spun 

10 for 20 seconds, followed by addition of 1 \iL SUPERase-In™ (20U/J1L, RNAse inhibitor, 
Ambion, USA), 4 nL SxRTase buffer (Invitrogen, USA), 2 jiL 0.1 M DTT (Invitrogen, USA), 
1 fiL dNTP (20mM dATP, dGTP, dTTP; 0.4 mM dCTP in DEPC-water, Amersham Pharmacia 
Biotech, USA), and 3 jiL Cy3™-dCTP or Cy5™-dCTP (Amersham Pharmacia Biotech, USA). 
First strand cDNA synthesis was carried out by adding 1 |lL of Superscript™ II (Invitrogen, 

15 200 U/mL), mixing, and incubating the reaction mixture for one hour at 42°C. An additional 1 
\lL of Superscript™ II was added, and the cDNA synthesis reaction mixture was incubated for 
an additional one hour at 42°C; the reaction was stopped by heating at 70°C for 5 minutes, and 
quenching on ice for 2 minutes. The RNA was hydrolyzed by adding 3 jiL of 0.5 M NaOH, 
and incubating at 70°C for 15 minutes. The samples were neutralized by adding 3 \iL of 0.5 M 

20 HC1, and purified by adding 450 \lL lxTE buffer, pH 7.5 to the neutralized sample and 
transferring the samples onto a Microcon-30 concentrator. The samples were centrifuged at 
14000xg in a microcentrifuge for -8 minutes, the flow-through was discarded and the washing 
step was repeated twice by refilling the filter with 450 jil lxTE buffer and by spinning for -12 
minutes. Centrifugation was continued until the volume was reduced to 5 jiL, and finally the 

25 labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 
3 minutes. 

Hybridization with fluorochrome-labelled cDNA 

The arrays were hybridized overnight using the following protocol. The Cy3™ or 
Cy5™-Iabelled cDNA samples were combined in one tube followed by addition of 3 jiL 
30 20xSSC (3xSSC final), 0.5 JiL 1 M HEPES. pH 7.0 (25 mM final), 25 ^ig yeast tRNA (1.25 
p.g/^L final), 10 Jig PolyA blocker (0.5 Hg/jiL final), 0.6 JiL 10% SDS (0.3% final), and DEPC- 
treated water to 20 JiL final volume. The labelled cDNA target sample was filtered in a 
Millipore 0.22 micron spin column according to the manufacturer's instructions (Millipore, 
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USA), and the probe was denatured by incubating the reaction at 100°C for 2 minutes. The 
sample was cooled at 20-25°C for 5 minutes by spinning at maximum speed in a 
microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of 
the microarray spotted on Immobilizer™ MicroArray Slide, and the hybridization mixture was 

5 applied to the array from the side. An aliquot of 30 [iL of 3xSSC was added to both ends of the 
hybridization chamber, and the Immobilizer™ MicroArray Slide was placed in the 
hybridization chamber. The chamber was sealed watertight and incubated at 65°C for 16-18 
hours submerged in a water bath. After hybridization, the slide was removed carefully from the 
hybridization chamber and washed using the following protocol. The Lifterslip coverslip was 

10 washed off in 2 x SSC, pH 7.0 containing 0.1% SDS at room temperature for one minute, 
followed by washing of the microarrays subsequently in 1.0 x SSC, pH 7.0 at room temperature 
for one minute, and then in 0.2 x SSC, pH 7.0 at room temperature for one minute. Finally, the 
slides were washed for 5 seconds in 0.05 x SSC, pH 7.0. The slides were then dried by 
centrifugation in a swinging bucket rotor at approximately 600 rpm for 5 minutes. 

15 Data analysis 

Following washing and drying, the slides were scanned using a ScanArray 4000XL 
scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the 
GenePix™ Pro 4.0 software package (Axon, USA). 
Results 

20 Incorporation of LNA nucleotides at every third nucleotide position in standard 50-mer 

expression array oligonucleoitde capture probes resulted in a 3-fold increase in fluorescence 
intensity levels, when hybridized under standard stringency conditions (Figs. 12A and 12B). 
When the hybridization temperature is increased from 65 °C to 70 °C, the capture of the SWI5 
spike mRNA by LNA 50-mer oligos is increased by 8-fold relative to the DNA controls. Thus, 

25 it can clearly be concluded that oligonucleotides containing LNA units are more sensitive in 
expression profiling compared to oligonucleotides containing only DNA units. The specificity 
of 40-mer LNA/DNA mixmer capture probes in the discrimination of highly homologous target 
sequences was addressed by introducing 1-5 consecutive mismatches in the middle of SWI5 and 
THI4 capture oligos together with the corresponding DNA controls. As demonstrated in Figs. 

30 13 A and 13B, the LNA-spiked (LNA modification at every fourth nucleotide position) 40-mer 
triple mismatch oligos showed a 3-fold signal intensity decrease relative to the perfectly 
matched duplexes, whereas the corresponding 40-mer standard DNA capture probes did not 
form duplexes under standard hybridization stringency. Further, the 40-mer perfect match LNA 
capture probes showed a 5-fold to 14-fold increase in the intensity levels compared to DNA 


LNA2I/SKA/MSL 


5/14/2003 . 


70 

oligonucleotides under standard hybridization conditions. Capture probes of other lengths 
and/or with other LN A substitution patterns can be used similarly. 

Table 2. DNA and LNA-rnodified SW15 (YDR146C) and TH14 (YGR144W) oligonucleotide 
capture probes. LNA modifications are depicted by uppercase letters in the sequence, mt 
5 denotes the number of mismatches (bolded) in the center of the oligonucleotide with respect to 
its target cDNA (mRNA), and "mC denotes LNA methyl cytosine. 


Oligo Name 

YDR146C-40 

YDR146C-40_mt1 

Y0R146C-40_mt2 

YDR146C-40_mt3 

YDR146C-40_mt4 

YDR146C-40_mt5 

YDR146C-40J.NA4 

YDR1 46C-40_LNA4_mt1 

YDR1 46C-40_LNA4_mt2 

YDR1 46C-40_LNA4_mt3 

YDR1 46C-40_LNA4_mt4 

YDR1 46C-40_LNA4_mt5 

YDR146C-50 

YDR146C-50_mt1 

YDR146C-50_mt2 

YDR146C-50_mt3 

YDR146C-50_mt4 

YDR146C-50_mt5 

YDR146C-50_LNA2 

YDR146C-50J-NA3 

YDR146C-50_LNA4 

YDR146O50J-NA5 

YDR146C-50_LNA6 

YDR1 46C-50_LNA3_mt1 

YDR1 46C-50_LNA3_mt2 

YDR1 46C-50_LNA3_mt3 

YDR1 46C-50_LNA3_mt4 

YDR 1 46C-50_LNA3_mt5 

YGR144W-40 

YGR144W-40_mt1 

YGR144W-40_mt2 

YGR144W-4Q mt3 


Sequence 

acggggattatggtttcgccaatgaaaactaatcaaaggt 

acggggattatggtttcgcctatgaaaactaatcaaaggt 

acggggattatggtttcgcgtatgaaaactaatcaaaggt 

acggggattatggtttcgggtatgaaaactaatcaaaggt 

acggggattatggtttcgggtttgaaaactaatcaaaggt 

acggggattatggtttcgggttagaaaactaatcaaaggt 

acGgggAttaTggtTtcgmCcaaTgaaAactAatcAaagGt 

acGgggAttaTggtTtcgmCctaTgaaAactAatcAaagGt 

acGgggAttaTggtTtcgmCgtaTgaaAactAatcAaagGt 

acGgggAttaTggtTtcgGgtaTgaaAactAatcAaagGt 

acGgggAttaTggtTtcgGgttTgaaAactAatcAaagGt 

acGgggAttaTggtTtcgGgttAgaaAactAatcAaagGt 

tgggaatggaacggggattatggtttcgccaatgaaaactaatcaaaggt 

tgggaatggaacggggattatggtatcgccaatgaaaactaatcaaaggt 

tgggaatggaacggggattatggtaacgccaatgaaaactaatcaaaggt 

tgggaatggaacggggattatggtaaggccaatgaaaactaatcaaaggt 

tgggaatggaacggggattatggaaaggccaatgaaaactaatcaaaggt 

tgggaatggaacggggattatggaaagcccaatgaaaactaatcaaaggt 

TgGgAaTgGaAcGgGgAtTaTgGtTtmCgmCcAaTgAaAamCtAaTcAaAgGt 

TggGaaTggAacGggGatTatGgtTtcGccAatGaaAacTaaTcaAagGt 

TgggAatgGaacGgggAttaTggtTtcgmCcaaTgaaAactAatcAaagGt 

TgggaAtggaAcgggGattaTggttTcgccAatgaAaactAatcaAaggt 

TgggaaTggaacGgggatTatggtTtcgccAatgaaAactaaTcaaagGt 

TggGaaTggAacGggGatTatGgtAtcGccAatGaaAacTaaTcaAagGt 

TggGaaTggAacGggGatTatGgtAacGccAatGaaAacTaaTcaAagGt 

TggGaaTggAacGggGatTatGgtAagGccAatGaaAacTaaTcaAagGt 

TggGaaTggAacGggGatTatGgaAagGccAatGaaAacTaaTcaAagGt 

TggGaaTggAacGggGatTatGgaAagmCccAatGaaAacTaaTcaAagGt 

ttgctgaactggatggattaaaccgtatgggtccaacttt 

ttgctgaactggatggatttaaccgtatgggtccaacttt 

ttgctgaactggatggatataaccgtatgggtccaacttt 

ttgctgaactggatggatattaccgtatgggtccaacttt 


LNA21/SKA/M5L 


5/ 1 4/2003 


71 

YGR144W-40_mt4 ttgctgaactggatg gatatttccgtatgggtccaacttt 

YGR144W-40_mt5 ttgctgaactggatg gatatttgcgtatgg gtccaacttt 

YGR1 44W-40.LN A4 ttGctg AactGgat GgatTaaamCcgtAtg g G tec AactTt 

YGR144W-40_LNA4_mt1 ttGdgAactGgatGgatTtaamCcgtAtggGtccAactTt 

YGR144W-40_LNA4_mt2 ttGctg AactG gat GgatAtaamCcgtAtggGtcc AactTt 

YGR144W-40_LNA4_mt3 ttGctg AactG gatGgatAttamCcgtAtggG tec AactTt 

YGR144W-40_LNA4_mt4 ttGctgAactGgatGgatAtttmCcgtAtggGtccAactTt 

YGR1 44W-40_LN A4_mt5 ttGctg AactGgat G gat AtttGcgtAtggG tec AactTt 

YGR144W-50 ggtatggaagttgctgaactggatggattaaaccgtatgggtccaacttt 

YGR144W-50_mt1 ggtatggaagttgctgaactggatcgattaaaccgtatgggtccaacttt 

YGR144W-50_mt2 ggtatggaagttgctgaactggatccattaaaccgtatgg gtccaacttt 

YG R1 44W-50_mt3 ggtatggaagttgctgaactggaaccattaaaccgtatgggtccaacttt 

YGR144W-50_mt4 ggtatggaagttgctgaactggaacctttaaaccgtatgggtccaacttt 

YG R1 44 W-50_mt5 gg tat ggaa gttgctg aactgga aecta taaaccg tatg gg tccaacttt 

YGR144W-50_LNA3 GgtAtgGaaGttGctGaamCtgGatGgaTtaAacmCgtAtgGgtmCcaActTt 

YGR144W-50_LNA3_mt1 GgtAtgGaaGttGctGaamCtgGatmCgaTtaAacmCgtAtgGgtmCcaActTt 

YGR144W-50__LNA3_mt2 GgtAtgGaaGttGctGaamCtgGatmCcaTtaAacmCgtAtgGgtmCcaActTt 

YGR1 44W-50_LNA3_mt3 GgtAtgGaaGttGctGaamCtgGaamCcaTta AacmCgtAtgGgtmCca ActTt 

YGR144W-50_LNA3_mt4 GgtAtgGaaGttGctGaamCtgGaamCctTta AacmCgtAtgGgtmCca ActTt 

YGR144W-50_LNA3_mt5 Ggt At gGaaGttGctGaamCtgGaamCctAta AacmCgtAtgGgtmCca ActTt 

Example 2: Detection of Alternative Splice Isoforms Using Exon-Specific. Internal LNA 
Capture Probes in the Caenorhabditis elezans Gene let-2 
Capture probe design 
5 Finding the regions of interest 

From the database "intronerator" [http://www.cse.ucsc.edu/-kent/intronerator/] as well 
as scientific literature, the C. etegans Let-2 gene encoding type IV collagen was found 
according to the following criteria. The generation of mature mRNA desirably involves either 
complete exon or intron skipping. ESTs (expressed sequence tags) desirably indicate different 

10 isoforms. If ESTs were only different from the gene annotation(s), this could simply mean that 
the prediction is wrong, and nothing more. Desirably, there are different EST splice indications 
in different developmental stages. Two gene prediction algorithms (e.g., GeneFinder and 
Genie) desirably agree upon the number of genes in a coding segment. Exons of interest (e.g., 
exons being skipped and their flanking exons) in the C. elegans gene T01D3.3 desirably exceed 

15 70 base-pairs. Other genes of interest may be selected using one or more of the above criteria 
or using other criteria, such as the medical relevance of the gene. 


LNA2I/SKA/MSL 


5/14/2003 


72 

Determining melting temperatures and palindromic properties of the C. elegans let-2 
gene/exons 8, 9. 10. and 1 1 -specific capture probes 

The script PICK70 (which was kindly provided by Jingchun Zhu from the Joe DeRisi 
Laboratory and which is publicly available) was used to run a sliding 50 base-pair window 
5 across the regions in which an oligonucleotide capture probe should be designed. The output 
data were saved for later. 
Determining the uniqueness of the regions 

All regions were compared using a publicly available BLAST program to the complete 
set of annotated transcripts from the C. elegans genome downloaded from NCBI. For each 
10 region a list with the location of all BLAST hits was made. 
Choice of Desirable T m for Capture Probes 

From the PICK70 output, the distribution of melting temperatures for all possible 
oligonucleotides was collected. As these centered around approximately 80°C f this temperature 
was chosen as the desirable temperature. 
15 For each region, an oligonucleotide with a palindromic value below 100 (default value 

in PICK70, value based on Smith-Waterman algorithm) and with a melting temperature closest 
to 80°C was picked. The location of the oligonucleotide within the region was then compared 
to the list made using the above BLAST search. If the oligonucleotide did not coincide with a 
BLAST hit exceeding around 25 (consecutive) base-pairs, this oligonucleotide sequence was 
20 chosen as a 50-mer capture probe. Otherwise, a new oligo sequence was picked from the 
PICK70 output. 
Checking Probe Sequences 

The selected 50-mer oligonucleotide sequences were "BLASTed" against the C. elegans 
transcripts again, as described above, 
25 Accounting for the Introns. 

The oligonucleotide sequences were "BLASTed" against the complete G elegans 
genome. The matches were run against a list made from the GenBank reports of the complete 
genome, indexing whether positions in the genome were genie or intergenic. 

It was checked to determine whether new hits to genie regions appeared (compared to 
30 the initial BLAST search using the PICK70 output). If this was not case, the oligonucleotide 
sequences were selected for capture probe synthesis. 
Design of the LNA-modified Capture Probes 

For the LNA-modified oligonucleotide capture probes, every fourth DNA nucleotide 
was substituted with an LNA nucleotide, as shown in Table 3. The oligonucleotides were 
35 synthesized with an anthraquinone (AQ) moiety at the 5'-end of each oligonucleotide (e.g., as 
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described in allowed U.S.S.N. 09/61 1,833), followed by a hexaethyleneglycol tetramer linker 
region and the LNA/DNA mixmer capture oligonucleotide sequence. 


Table 3. C. elegans let-2 gene/exons 8, 9, 10, and 1 1-specific capture probes 

5 


Capture probe 

Sequence (LNA=uppercase, DNA=lowercase letters) 

CE42.08-0HEG4 

GgctGgatmCccxjAggaAaccmCaegAatcGeaaGcatTggamCcaaAageAg 


CE42.09-0HEG4 

mCaccGeatmCcggmCtcaAtteTcBeAcctmCgcsGaaamCcctGgagAaaaGe 


CE42.10-0HEG4 

TcccmCcaeGcccAatcGcctmCcacmCatgTccaAgcEAaccAttaTccsTc 


CE42.11-0HEG4 

GagcmCaggAgagGsagGtcaAcecGgttAcccAeeaAateGaeeActcTc 


Strains and Growth Conditions 

C. elegans wild-type strain (Bristol-N2) was maintained on nematode growth medium 
(NG) plates seeded with Escherichia coli strain OP50 at 20°C, and the eggs and LI larvae were 
10 prepared as described in Hope, I. A. (ed.) " C. elegans - A Practical Approach Oxford 
University Press 1999. The samples were immediately flash frozen in liquid N2 and stored at - 
80°C until RNA isolation. 
Isolation of Total RNA 

A 100 pi aliquot of packed C. elegans worms from a LI larvae population was 
15 homogenized using the FastPrep Bio 101 from Kem-En-Tec for 1 minute at speed 6 followed by 
isolation of total RNA from the extracts using the FastPrep BiolOl kit (Kem-En-Tec) according 
to the manufacturer's instructions. A 50 pi aliquot of packed C. elegans eggs was homogenized 
in lysis buffer (RNeasy total RNA purification kit, QIAGEN) containing quartz sand for 3 
minutes using a Pellet Pestle Motor followed by isolation of total RNA according to the 
20 manufacturer's manual. 

The eluted total RNA from worms (LI larvae) as well as eggs was ethanol precipitated 
for 24 hours at - 20°C by addition of 2.5 volumes of 96% EtOH and 0.1 volume of 3M Na- 
acetate, pH 5.2 (Ambion, USA), followed by centrifugation of the total RNA sample for 30 
minutes at 13200 rpm. The total RNA pellet was air-dried and redissolved in 6 pi (worms) or 
25 2.5 pi (eggs) of diethylpyrocarbonate (DEPC)-treated water (Ambion, USA) and stored at - 
80°C. 

Reverse transcription (RT)-PCR 

Total RNA (1.5 pg) from eggs or 1 pg total RNA from worms (LI larvae) were mixed 
with 5 jLLg oligo(dT)12-18 primer (Amersham Pharmacia Biotech, USA) and 0.5 pg of random 
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hexamers, pd(N>6 (Amersham Pharmacia Biotech, USA) and DEPC-treated water to a final 
volume of 7 pi. The mixture was heated at 70°C for 10 minutes, quenched on ice for 5 
minutes, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 4 pi of 
5 x Superscript buffer (Life Technologies, USA), 2 jil of 100 mM DTT, 1 pi of dNTP solution 
5 (20 mM each dATP, dGTP, dTTP and dCTP, Amersham Pharmacia Biotech, USA), and 3 |xl of 
DEPC-treated water. 

The primers were pre-annealed at 37°C for 5 minutes, followed by addition of 400 units 
of Superscipt II reverse transcriptase (Invitrogen, USA). First strand cDNA synthesis was 
carried out at 37°C for 30 minutes, followed by 2 hours at 42°C, and the reaction was stopped 

10 by incubation at 70°C for 5 minutes, followed by incubation on ice for 5 minutes. 

Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR 
columns as described below. The column was pre-spun for 1 minute at 735 x g in a 1.5 ml 
tube, and the column was placed in a new 1.5 ml tube. The cDNA sample was slowly applied 
to the top center of the resin and spun at 735 x g for 2 minutes. The eluate was collected. The 

15 volume of the eluate was adjusted to 50 pi with TE-buffer pH 7.0 before being used as the 
template for linear PCR. Four pi template (RT from eggs or worms) was combined with 1 pi 
dNTP solution (lOmM each dATP, dGTP, dTTP and dCTP, Amersham Phamacia Biotech, 
USA), 1 (il of each primer ( 20pM CE42.07 sense: gatcgaattcctccaggagagaagggagatg, and 
CE42.12 antisense: 5'gatcaagcttatctcttcctgggtatccagctt), 5 pi 10 x AmpliTaq Gold Polymerase 

20 buffer, 5pl 25 mM MgCl 2 , 0.5 pi AmpliTaq Gold DNA polymerase (5U/pl, Applied 
Biosystems), 2 pi Cy3-dCTP (Amersham Phamacia Biotech, USA) (eggs) or 2 pi Cy5-dCTP 
(Amersham Pharmacia Biotech, USA) (worms), and 31.5 pi DEPC-treated water to a final 
volume of 50 pi. The PCR reactions were carried out using the following program: 95°C for 5 
minutes followed by 30 cycles of PCR using the following cycling program (denaturation at 

25 95°C for 45 seconds, annealing at 60°C for 30 seconds, and extension at 72°C for 1 minute) 
followed by a final extension step at 72°C for 10 minutes and incubation on ice for 5 minutes. 

Purification of the PCR amplicons from eggs as well as worms was performed using a 
Qiaquick PCR purification kit (QIAGEN) according to the manufacturer's instructions. 
Fluorochrome-labeling of the let-2 cDNA Fragments using Primer Extension 

30 Four (4) pi template (RT from eggs or worms) was combined with 1 pi dNTP solution 

(lOmM each dATP, dGTP, dTTP and dCTP, Amersham Phamacia Biotech, USA), 1 pi of each 
primer (20pM CE42.12 antisense 5'gatcaagcttatctcttcctgggtatccagctt), 5 pi 10 x AmpliTaq 
Gold Polymerase buffer, 5pl 25 mM MgCl 2 , 0.5 pi AmpliTaq Gold DNA polymerase (5U/pl, 
Applied Biosystems), 2 pi Cy3-dCTP (Amersham Phamacia Biotech, USA) (eggs) or 2 pi Cy5- 
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dCTP (Amersham Phamacia Biotech, USA) (worms), and 3 1.5 |xl DEPC-treated water to a final 
volume of 50 \xl. The PCR reactions were carried out using the following program: 95°C for 5 
minutes followed by 30 cycles of PCR using the following cycling program (denaturation at 
95°C for 45 seconds annealing at 60°C for 30 seconds extension at 72°C for 1 minute) followed 
5 by a final extension step at 72°C for 10 minutes and incubation on ice for 5 minutes. 

Purification of the PCR amplicons from eggs as well as worms were performed using a 
Qiaquick PCR purification kit (QIAGEN) according to the manufacturer's instructions. 
Unincorporated dNTP nucleotides were removed by gel filtration using MicroSpin S-400 HR 
columns as described above before the elutcd, fluorochrome-labelled DNA fragments were 

10 stored at -20°C in the dark until microarray hybridization. 

Printing and Coupling of the C. elegans Let-2 Exon 8-11 Microarrays 

The C. elegans gene Let-2/exon 8-11 capture probes were synthesized with a 5' 
anthraquinone (AQ)-modification, followed by a hexaethyleneglycoW (HEG4) linker (Table 
3). The capture probes were first diluted to a 10 |iM final concentration in 100 mM Na- 

15 phosphate buffer pH 7.0 and spotted on Euray COP microarray slides using the Biochip Arrayer 
One (Packard Biochip Technologies) with a spot volume of 300 pi and 300 ^im between the 
spots. 

The capture probes were immobilized onto the microarray slide by UV irradiation in a 
Stratalinker for 90 seconds at full power (Stratagene, USA). Non-immobilized capture probe 
20 oligonucleotides were removed from the slides by washing the slides for Vi hour in 30% acetone 
before rising in milli-Q H 2 0. After washing, the slides were centrifuged at 800 rpm for 2 
minutes and stored in a slide box until microarray hybridization. 

Comparative Hybridization of the C. elegans microarrays and Post-hybridization Washes 

The slides were hybridized with 2.5 |Xi of the Cy3-labelled and 2.5 ^.1 of the Cy5- 

25 labelled target preparation from eggs and worms, respectively, as described above (see 
'•Reverse transcription (RT)-PCR" section) in 25 \i\ of hybridization solution, containing 25 
mM HEPES, pH 7.0, 3 x SSC, 0.3% SDS, and 25 \ig of yeast tRNA. The target probe was 
filtered in a Millipore 0.22 micron spin column (Ultrafree-MC, Millipore, USA), denatured by 
incubation at 100°C for 5 minutes, cooled at room temperature for 5 minutes, and then carefully 

30 applied onto the prepared microarray. One-third of a cover slip was laid over the microarray, 
and the hybridization was performed for 16-18 hours at 65°C in a hybridization chamber 
(DieTech, model Joe deRisi, USA). 

Following hybridization, the slides were washed sequentially by plunging gently in 2 x 
SSC/0.1% SDS at room temperature until the cover slip falls off into the washing solution, then 
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in lx SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room temperature for 1 minute, 
then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for 1 
minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, 
followed by drying of the slides by spinning at 500 rpm for 5 minutes. The slides were stored 
5 in a slide box in the dark until scanning. 
Microarray data analysis 

The C. elegans let-2 gene microarray was scanned in an ArrayWoRx Scanner (Applied 
Precision, USA) using an exposure time of 5 seconds, resolution of 5.0, and high (high level) 
sensitivity. The hybridization data were analyzed using the ArrayVision image analysis 

10 software package 5.1 (IMAGING Research Inc., USA). The detection principle for alternative 
exon skipping the C. elegans let-2 gene is shown in Fig. 14. As demonstrated in Fig. 15, 
analysis of the comparative hybridization data from the C. elegans Let-2 exon 8-11 array 
demonstrates detection of alternative exon skipping of the let-2 exon 9 (eggs) and exon 10 (LI 
larvae) using LNA-modified 50-mer capture probes. Capture probes of other lengths and/or 

15 with other LNA substitution patterns can be used similarly. 

Example 3: Improved Sensitivity in the Specific Detection of the C. eleeans Gene T01D3.3 
Exon 4 Using LNA-Modified Oligonucleotide Capture Probes 

Capture Probe Design: The design method of exon-specific capture probes for the C. elegans 
20 gene T01D3.3 exon 4 has been described in example 2. 

Design of the LNA-modified Capture Probes: For the LNA-spiked oligonucleotide capture 
probes, every fourth DNA nucleotide was substituted with an LNA nucleotide, as shown in 


Table 4. 

Table 4. C. elegans gene T01D3.3/exon 4-specific capture probes. 


Capture probes 

Sequence (LNA=uppercase, DNA=lowercase letters) 

CEgene26.04-70 

ggctggaacagaagtttgttggtgcgtgacaaggtatggaagaagattatccggaaaagaaagcaa 
agac 

CEgene26.04-50 

ggctggaacagaagtttgttggtgcgtgacaaggtatggaagaagattat 

CEgene26.04-40 

ggctggaacagaagtttgttggtgcgtgacaaggtatgga 

CEgene26.04-30 

gaacagaagtttgttggtgcgtgacaaggt 

CEgene26.04-50HEG2 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTggaAgaaGattAt 

CEgene26.04-50HEG4 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTggaAgaaGattAt 

CEgene26.04-40HEG2 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTgga 

CEgene26.04-40HEG4 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTgga 

CEgene26.04-30HEG2 

GaacAgaaGtttGttgGtgcGtgamCaagGt 

CEgene26.04-30HEG4 

GaacAgaaGtttGttgGtgcGtgamCaagGt 
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Cultivation of Caenorhabditis elegans Worms 

Mixed stage C. elegans cultures were grown according to standard methods. Samples 
were harvested by centrifugation at 3000xg, suspended in RNA Later (Ambion, USA), and 
5 immediately frozen in liquid nitrogen. 
mRNA Isolation from C. elegans Mixed Stages Worms 

Poly(A) + RNA was isolated from the worm samples using the Pick-Pen (Bio-Nobile, 
Finland) Starter kit combined with the KingFisher mRNA purification kit (ThermoLabsystems, 
Finland) according to the manufacturer's instructions. The yield was 1-2 \ig poly(A) + RNA 

10 from approximately 50 mg of C. elegans worms. 

Synthesis of fluorochrome labelled first strand cDNAfrom C. elegans mRNA 

One \Lg of C. elegans poly(A) + RNA was combined with 2 \lg oligo dT primer (T20VN) 
in an RNase free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC- 
water to 8 |^L. The reaction mixture was heated at +70°C for 10 minutes, quenched on ice 5 

15 minutes, spun for 20 seconds, followed by addition of 1 \jlL SUPERase-In™ (20U/[iL, RNAse 
inhibitor, Ambion, USA), 4 jiL 5xRTase buffer (Invitrogen, USA), 2 ^.L 0.1 M DTT 
(Invitrogen, USA), 1 \iL dNTP (20mM dATP, dGTP, dTTP; 4 mM dCTP in DEPC-water, 
Amersham Pharmacia Biotech, USA), and 3 jiL Cy3™-dCTP (Amersham Pharmacia Biotech, 
USA). First strand cDNA synthesis was carried out by adding 1 \iL of Superscript™ II 

20 (Invitrogen, 200 U/mL), mixing, and incubating the reaction mixture for one hour at 42°C. An 
additional 1 JiL of Superscript™ II was added and the cDNA synthesis reaction mixture was 
incubated for an additional one hour at 42°C; the reaction was stopped by heating at 70°C for 5 
minutes, and quenching on ice for 2 minutes. The RNA was hydrolyzed by adding 3 JiL of 0.5 
M NaOH and incubating at 70°C for 15 minutes. The samples were neutralized by adding 3 fiL 

25 of 0.5 M HC1 and purified by adding 450 lxTE buffer, pH 7.5 to the neutralized sample and 
transferring the samples onto a Microcon-30 concentrator. The samples were centrifuged at 
14000xg in a microcentrifuge for -8 minutes, the flow-through was discarded, and the washing 
step was repeated twice by refilling the filter with 450 [i\ 1XTE buffer and by spinning for -12 
minutes. Centrifiigation was continued until the volume was reduced to 5 |iL, and finally the 

30 labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 
3 minutes. 

Printing and Coupling of the C. elegans Microarrays 

The C. elegans gene T01D3.3/exon 4 capture probes were synthesized with a 5' 
anthraquinone (AQ)-modification, followed by either a hexaethyleneglycol-2 or a 
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hexaethyleneglycol-4 (HEG2/HEG4) linker (Table 4). The capture probes were first diluted to 
a 10 \iM final concentration in 100 mM Na-phosphate buffer pH 7.0, followed by a two-fold 
dilution series (10 pM, 5 jiM, 2.5 *lM, 1.25 pM, 0.625 \iM, 0.31 fiM, and 0.155 jiM) and 
spotted on Exiqon's polycarbonate microarray slides using the Biochip Arrayer One (Packard 
5 Biochip Technologies, USA) with a spot volume of 3x 300 pi and 400 \im between the spots. 
The capture probes were immobilized onto the microarray slide by UV irradiation in a 
Stratalinker for 90 seconds at full power (Stratagene, USA). Non-immobilized capture probe 
oligonucleotides were removed from the slides by washing the slides for 24 hours in milli-Q 
H2O. After washing, the slides were dried in an oven at 37°C for 30 minutes and stored in a 

10 slide box until microarray hybridization. 
Hybridization with Cy3-labelled cDNA 

The arrays were hybridized overnight using the following protocol. The Cy 3™ -labelled 
cDNA sample was combined with 3 jlL 20xSSC (3xSSC final), O.SflLIM HEPES, pH 7.0 (25 
mM final), 25 ng yeast tRNA (1.25 |ig/jiL final), 10 jig PolyA blocker (0.5 ^g/^L final), 0.6 

15 iiL 10% SDS (0.3% final), and DEPC-treated water to 20 final volume. The labelled cDNA 
target sample was filtered in a Millipore 0.22 micron spin column according to the 
manufacturer's instructions (Millipore, USA), and the probe was denatured by incubating the 
reaction at 100°C for 2 minutes. The sample was cooled at 20-25°C for 5 minutes by spinning 
at maxium speed in a microcentrifuge, and then carefully applied on top of the microarray. A 

20 cover slip was laid over the microarray and the hybridization was performed for 16 hours at 
63°C in a hybridization chamber (Corning, USA) submerged in a water bath, with an aliquot of 
30 jiL of 3xSSC added to both ends of the hybridization chamber to prevent evaporation. After 
hybridization, the slide was removed carefully from the hybridization chamber and washed 
using the following protocol. The coverslip was washed off in 2 x SSC, pH 7.0 containing 

25 0.1% SDS at room temperature for one minute, followed by washing of the microarrays 
subsequently in 1.0 x SSC, pH 7.0 at room temperature for one minute, and then in 0.2 x SSC, 
pH 7.0 at room temperature for one minute. Finally, the slides were washed for 5 seconds in 
0.05 x SSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at 
approximately 600 rpm for 5 minutes. 

30 Microarray data analysis 

The C. elegans gene T01D3.3 exon 4 array was scanned in an ArrayWoRx Scanner 
(Applied Precision, USA) using an exposure time of 5 seconds, resolution of 5.0, and high 
(high level) sensitivity. The hybridization data were analyzed using the Array Vision image 
analysis software package 5.1 (IMAGING Research Inc., USA). As shown in Figs. 16A and 
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16B f analysis of the hybridization data from the C. elegans gene 26/T01D3.3 exon 4 array 
demonstrates that the use of LNA-modified capture probes for the C. elegans T01D3.3 exon 
results in 5-fold increased sensitivity in exon 4 capture compared to the corresponding DNA 
oligonucleotide capture probe controls printed on the same microarray. Capture probes of other 
5 lengths and/or with other LN A substitution patterns can be used similarly. 

Example 4: Assessment of Capture Probe Specificity for the C. elegans Gene T01D3.3 Exons 4 
and 5 Using Synthetic Antisense Target Oligos 

Capture probe design: Exon-specific capture probes for the C elegans gene T01D3.3 exons 4 
10 and 5 were designed as described in Example 2. 

Design of the LNA-modified capture probes: For the LNA-spiked oligonucleotide capture 
probes, every fourth DNA nucleotide was substituted with an LNA nucleotide, as shown in 
Table 5: C. elegans gene T01D3.3/exons 4 and 5-specific capture probes and synthetic target 
oligonucleotides. 


15 Table 5. 


Capture probes 

Sequence (LNA=uppercase, DNA= lowercase letters) 

CEgene26.04-70 

ggctggaacagaagtttgttggtgcgtgacaaggtatggaagaagattatccggaaaagaa 
agcaaagac 

CEgene26.05-70 

tatgtggcgcgaatgagcaatattcagcatgtttctcctcttgtcaaccatcatgtcaagatcctt 
caac 

CEgene26.04-50 

ggctggaacagaagtttgttggtgcgtgacaaggtatggaagaagattat 

CEgene26.05-50 

taigtggcgcgaatgagcaatattcagcatgtttctcctcttgtcaacca 

CEgene26.04-40 

ggctggaacagaagtttgttggtgcgtgacaaggtatgga 

CEgene26.05-40 

tatgtggcgcgaatgagcaatattcagcatgtttctcctc 

CEgene26.04-30 

gaacagaagtttgttggtgcgtgacaaggt 

CEgene26.05-30 

tatgtggcgcgaatgagcaatattcagcat 

CEgene26.04-50HEG2 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTggaAgaaGattAt 

CEgene26.04-50HEG4 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTggaAgaaGattAt 

CEgene26.05-50HEG2 

TatgTggcGcgaAtgaGcaaTattmCagcAtgtTtctmCctcTtgtmCaacmCa 

CEgene26.05-50HEG4 

TatgTggcGcgaAtgaGcaaTattmCagcAtgtTtctmCctcTtgtmCaacmCa 

CEgene26.04-40HEG2 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTgga 

CEgene26.04-40HEG4 

GgctGgaamCagaAgttTgttGgtgmCgtgAcaaGgtaTgga 

CEgene26.05-40HEG2 

TatgTggcGcgaAtgaGcaaTattmCagcAtgtTtctmCctc 

CEgene26.05-40HEG4 

TatgTggcGcgaAtgaGcaaTattmCagcAtgtTtctmCctc 

CEgene26.04-30HEG2 

GaacAgaaGtttGttgGtgcGtgamCaagGt 

CEgene26.04-30HEG4 

GaacAgaaGtttGttgGtgcGtgamCaagGt 

CEgene26.05-30HEG2 

TatgTggcGcgaAtgaGcaaTattmCagcAt 

CEgene26.05-30HEG4 

TatgTggcGcgaAtgaGcaaTattmCagcAt 
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Target oligos 

Sequence (LNA=uppercase f DNA=lowercase letters) 

CEgene26.04-btotarget 

accttgtcacgcaccaacaaacttctgttc 

CEgene26.05-biotarget 

atgctgaatattgctcattcgcgccacata 


Printing and coupling of the C. elegans gene T01D3.3/exon 4-5 microarrays 

The C elegans gene T01D3.3/exon 4-5 capture probes were synthesized with a 5' 
5 anthraquinone (AQ)-modification, followed by either a hexaethyleneglycol-2 or a 
hexaethyleneglycol-4 (HEG2/HEG4) linker (Table 5). The capture probes were first diluted to 
a 10 \iM final concentration in 100 mM Na-phosphate buffer pH 7.0, followed by a two-fold 
dilution series (10 \iM 9 5 pM, 2,5 \\M. 1.25 jiM, 0.625 \xM, 0.31, jiM, and 0.155 ixM) and 
spotted on Euray polycarbonate microarray slides using the Biochip Arrayer One (Packard 

10 Biochip Technologies) with a spot volume of 3x 300 pi and 400 \im between the spots. The 
capture probes were immobilized onto the microarray slide by UV irradiation in a Stratalinker 
for 90 seconds at full power (Stratagene, USA). Non-immobilized capture probe 
oligonucleotides were removed from the slides by washing the slides for 24 hours in milli-Q 
H 2 0. After washing, the slides were dried in an oven at 37°C for 30 minutes, and stored in a 

15 slide box until microarray hybridization. 

Hybridization of the C. elegans microarrays and post-hybridization washes 

The slides were hybridized with a high (saturated) concentration of IfiM of each gene 
T01D3.3, exon 4 or 5 target oligo (Table 5) in 50 (J.1 of hybridization solution, containing 25 
mM HEPES, pH 7.0, 3 x SSC, 0.22 % SDS, and 0.8 |ig/|il of poly(A) blocker. The target 

20 probes were filtered in a Millipore 0.45 micron spin column (Ultrafree-MC, Millipore, USA), 
denatured by incubation at 100 °C for 2 minutes, cooled at room temperature for 5 minutes, and 
then carefully applied onto the prepared microarray. One-half of a cover slip was laid over the 
microarray, and the hybridization was performed for 16-18 hours at 63°C in a hybridization 
chamber (Corning, USA). 

25 Following hybridization, the slides were washed sequentially by plunging gently in 1 x 

SSCT (150 mM NaCl, 15 mM Sodium Citrate + Tween 20) at room temperature for one 
minute, then in 0.2 x SSCT (30 mM NaCl, 3 mM Sodium Citrate + Tween 20) at room 
temperature for one minute, and finally in Milli Q water, followed by drying of the slides in an 
oven at 37°C for 30 minutes. The slides were Cy5 labelled using a Cy5-straptavidin target. 

30 Thirty p.1 of a Cy5-streptavidin (2ng/ml in 1 x SSCT) were carefully applied onto the 
hybridized microarray and incubated one hour at room temperature before an additional 
washing step were performed in 1 x SSCT (150 mM NaCl, 15 mM Sodium Citrate + Tween 20) 


LNA2I/SKA/MSL 


5/14/2003 


81 

at room temperature for one minute, then in 0.2 x SSCT (30 mM NaCi, 3 mM Sodium Citrate + 
Tween 20) at room temperature for one minute, and finally in Milli Q water. Following 
washing, the slides were drying in an oven at 37°C for 30 minutes and stored in a slide box in 
the dark until scanning. 
5 Microarray data analysis 

The C. elegans gene T01D3.3 exon 4-5 microarray was scanned in an Array WoRx 
Scanner (Applied Precision, USA) using an exposure time of 5 seconds, resolution of 5.0, and 
high (high level) sensitivity. The hybridization data were analyzed using the ArrayVision 
image analysis software package 5.1 (IMAGING Research Inc., USA). As shown in Figs. 18A 

10 and 18B, analysis of the hybridization data from the C. elegans gene 26/T01D3.3 array 
demonstrates that both the DNA as well as the LNA capture probes for the C. elegans T01D3.3 
exons 4 (Fig. 18 A) and exon 5 (Fig. 18B), respectively are highly specific with a very low level 
of cross-hybridization between their respective target oligonucleotides. The exon-specific 
design of the oligonucleotide capture probes is thus validated. Capture probes of other lengths 

1 5 and/or with other LNA substitution patterns can be used similarly. 

Example 5: Detection of Alternatively Spliced Isoforms usinp Internal Exon-specific. and 

Exon-Exon Junction-Specific (merged) LNA-modified Capture Probes 

Oligonucleotide design for microarrays. 
20 Methods for designing exon-specific internal oligonucleotide capture probes has been 

described in Example 2. 

Design of the LNA-modified capture probes 

For the LNA-modified oligonucleotide capture probes, every third DNA nucleotide was 

substituted with an LNA nucleotide. The probes designed to capture the junction of the 
25 recombinant splice variants were designed with LNA modifications in a block of five 

consecutive LNAs nucleotides, two on the 5' side of the splice junction and three on the 3' side 

of the splice junction. All capture probes are shown in Table 6. 

Table 6. Internal, exon-specific and merged, exon-exon junction specific oligonucleotide 
30 capture probes. 

Capture probes Sequence (LNA=uppercase, DNA lowercase letters) 

gene78.0 1 a cctgaaagtagatttgttamccgaaacgccttctcccgUcttaagtc 
gene78.0 1 b catataccacaaatagtccctcaaaaatcacaagaaaactcacaacactg 
gene78.03a gatugcagcggtggtaaaaagtatgaaaacgtggtaattaaaaggicic 
genc78.03b ccaaigaaaactaaicaaaggtaaacgtggatcccatggcaattcccggg 
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caacacigcccagaggucaatcgatccgaigatcctaatgaaggcgccc 

gtccagutcgtccatcaugiatcgataaataigigaaggaaatgcctg 

caacactgcccagaggttcaatcgatgtgtgataggatcagtgttcaggg 

gaaggcgaaggagaclgctaaialcgataaatatgtgaaggaaatgcctg 

mCctGaaAgtAgaTttGtlAltTccGaaAcgmCciTctmCccGtimCttAagTc 

mCal AlamCc amCaa AtaGic mCct mCaaAaaTcamCaaGaa AacTcamCaamCacTg 

GatTtgmCagmCggTggTaaAaaGtaTgaAaamCgtGgtAatTaaAagGtcTc 

mCcaAtgAaaActAatmCaaAggTaaAcgTggAtcmCcaTggmCaaTtcmCcgGg 

caacactgcccagaggUcaatcGATmCmCgatgatcctaatgaaggcgccc 

gtccagtatcgtccatcatAGTATcgataaatatgtgaaggaaatgcctg 

caacactgcccagaggttcaatcGATGTgtgalaggatcagtgltcaggg 

gaaggcgaaggagactgctAATATcgataaatatgtgaaggaaatgcctg 


Printing and Coupling of the Splice Isoforni-Specific Microarrays 

The splice variant capture probes were synthesized with a 5' anthraquinone (AQ)~ 
5 modification, followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes were first 
diluted to a 20 ^tM final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on 
the Immobilizer polymer microarray slides (Exiqon, Denmark:) using the Biochip Arrayer One 
(Packard Biochip Technologies, USA) with a spot volume of 2x 300 pi and 300 \im between 
the spots. The capture probes were immobilized onto the microarray slide by UV irradiation in 

10 a Stratalinker with 2300 ^joules (Stratagene, USA). Non-immobilized capture probe 
oligonucleotides were removed from the slides by washing the slides two times 15 minutes in 
lxSSC. After washing, the slides were dried by centrifugation at lOOOx g for 2 minutes, and 
stored in a slide box until microarray hybridization. 
Construction of Splice Variant Vectors 

15 The recombinant splice variant constructs were cloned into the Triampl8 vector 

(Ambion, USA). The constructs were sequenced to confirm their construction. The plasmid 
clones were transformed into E. coli XLIO-Gold (Stratagene, USA). 
Triampl8/SWI5 vector construct 

Genomic DNA was prepared from a wild-type standard laboratory strain of 

20 Saccharomyces cerevisiae using the Nucleon Mi Y DNA extraction kit (Amersham Biosciences, 
USA) according to the supplier's instructions. Amplification of the partial yeast gene was 
performed using standard PCR with yeast genomic DNA as the template. In the first step of 
amplification, a forward primer containing a restriction enzyme site and a reverse primer 
containing a universal linker sequence were used. In this step, 20 base-pairs were added to the 
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3*-end of the amplicon, next to the stop codon. In the second step of amplification, the reverse 
primer was exchanged with a nested primer containing a poly-T2o tail and a restriction enzyme 
site. The SWI5 amplicon contains 730 bp of the SWJ5 ORF plus a 20 bp universal linker 
sequence and a poly-A 2 o tail. The PCR primers used were YDR146C-For-EcoRI 
5 (acgtgaattcaaatacagacaatgaaggagatga), 
YDR146C-Rev-Uni (gatccccgggaattgccatgttacctttgattagttttcattggc), and 
Uni-polyT-BamHI (acgtggatccttttttttttttttttttttgatccccgggaattgccatg). 

The PCR amplicon was cleaved with the restriction enzymes EcoRl and BamHL The 
DNA fragment was ligated into the pTRIampl8 vector (Ambion, USA) using the Quick 
10 Ligation Kit (New England Biolabs, USA) according to the supplier's instructions and 
transformed into E. colt DH-5a by standard methods. 

Construction of the Recombinant Splice Variant #1 (Triampl8/swi5-rubisco) 

The Arabidopsis thaliana Rubisco small subunit ssu2b gene fragment (gil7064721) was 
amplified from genomic DNA using primers named DJ305 (5*- 

15 ACTATGATGGACGATACTGGAC-3') and DJ306 (5*- 

ATTGGATCGATCCGATGATCCTAATGAAGGC-3'), containing Clal restriction site 
linkers. The purified PCR fragment was digested with Clal and then cloned into the swi5 
(gl;7839148) vector at the unique Clal site (atcgat) giving each insert a flanking sequence from 
the original yeast SWI5 insert (named exonOl and exon 03, Fig. 19). The product was inserted 

20 in the reverse orientation, so that the insert sequence is as follows: 

atcgatCCGATGATCCTAATGAAGGCGCCCGGGTACTCCTTCTTGCATTCTTCAACTTC 
CTTCAACACTTGAGCGGAGTCGGTGCATCCGAACAATGGAAGCTTCCACATTGTCC 
AGTATCGTCCATCATAGTatcgat 

Nucleotide sequence analysis revealed a difference between the sequence of A. thaliana 

25 rubisco expected from the GenBank database and that obtained from all sequenced constructs 
and PCR products. Position 30 in the Rubisco insert is "C" rather than the expected "A." This 
SNP was probably created by PCR. None of the oligonucleotide capture probes used in the 
example cover this region. The Rubisco sequence in Genbank is TCCTAATGAAGGCGCCA, 
and the sequence obtained from the plasmid contruct is TCCTAATGAAGGCGCCC. 

30 Construction of the Recombinant Splice Variant # 2 (Triampl8/swi5-lea) 

The Arabidopsis thaliana Lea gene (gi 1526423) was amplified from genomic DNA 
with primers named DJ307 (5'-GGAATTATCGATGTGTGATAGGATCAGTGTTCAG-3*), 
and DJ308 (5 * - A ATTGG ATCGAT ATT AGC AGTCTCCTTCGCC- 3 ' ), including the Clal 
linker sites as above. The PCR fragment was digested with Clal cloned into the yeast SWI5 

35 IVT construct as above at the unique Clal site. 
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The fragment was inserted in the forward orientation, resulting in the following insert 
sequence: 

atcgatGTGTGATAGGTTCAGTGTTCAGGGCTGTCCAAGGAACGTATGAGCATGCGAG 
AGACGCTGTAGTTGGAAAAACCCACGAAGCGGCTGAGTCTACCAAAGAAGGAGCT 
5 CAGATAGCTTCAGAGAAAGCGGTTGGAGCAAAGGACGCAACCGTCGAGAAAGCTA 
AGGAAACCGCTGATTATACTGCGGAGAAGGTGGGTGAGTATAAAGACTATACGGT 
TGATAAAGCTAAAGAGGCTAAGGACACAACTGCAGAGAAGGCGAAGGAGACTGCT 
AATatcgat. 

In vitro RNA Preparation from Splice Variant Vectors 
10 In vitro RNA from the splice variants were made using the MEGAscript™ high yield 

transcription kit according to the manufacturer's instructions (Ambion, USA). The yield of 
IVT RNA was quantified at a Nanodrop spectrophotometer (Nanodrop Technologies, USA, 
Fig. 19). 

Isolation of total RNA from C.elegans 

15 C. elegans wild-type strain (Bristol-N2) was maintained on nematode growth medium 

(NG) plates seeded with Escherichia coli strain OP50 at 20°C, and the mixed stages of the 
nematode were prepared as described by Hope (ed.) ("G elegans - A Practical Approach", 
Oxford University Press 1999). The samples were immediately flash frozen in liquid N2 and 
stored at -80°C until RNA isolation. 

20 A 100 fJll aliquot of packed C. elegans worms from a mixed stage population was 

homogenized using the FastPrep BiolOl from Kem-En-Tec for one minute, speed 6 followed 
by isolation of total RNA from the extracts using the FastPrep BiolOl kit (Kem-En-Tec) 
according to the manufacturer's instructions. 

The eluted total RNA was ethanol precipitated for 24 hours at - 20°C by addition of 2.5 

25 volumes of 96% EtOH and 0.1 volume of 3M Na-acetate, pH 5.2 (Ambion, USA), followed by 
centrifugation of the total RNA sample for 30 minutes at 13200 rpm. The total RNA pellet was 
air-dried and redissolved in 10 \ll of diethylpyrocarbonate (DEPC)-treated water (Ambion, 
USA) and stored at - 80°C. 
Fluorochrome-labelling of the Target 

30 Ten (10) \ig total RNA from C. elegans and 1 ng of in vitro RNA from Splice variant #1 

were combined with 5 \ig anchored o!igo(dT2o) primer and DEPC-treated water to a final 
volume of 8 |xl. The mixture was heated at 70°C for 10 minutes, quenched on ice for 5 
minutes, followed by addition of 20 units of Supcrasin RNase inhibitor (Ambion, USA), 1 \i\ 
dNTP solution (lOmM each dATP, dGTP, dTTP and 0.4 mM dCTP, and 3 [il Cy5-dCTP, 
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Amersham Biosciences, USA), 4 jxl 5 x RTase buffer (Invitrogen), 2fil 0.1 mM DTT 
(lnvitrogen), 400 units of Superscript IT reverse transcriptase (Invitrogen, USA), and DEPC- 
treated water to 20 \i \ Final volume. 

A parallel set-up was made with 10 |ig total RNA from C.elegans and 1 ng of in vitro 
5 RNA from Splice variant #2, labelling with Cy3-dCTP. Both cDNA syntheses were carried out 
at 42°C for 2 hours, and the reactions were stopped by incubation at 70°C for 5 minutes, 
followed by incubation on ice for 5 minutes. 

Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR 
columns as described below. The column was pre-spun for one minute at 1500 x g in a 1.5 ml 

10 tube, and the column was placed in a new 1.5 ml tube. The cDNA sample was slowly applied 
to the top center of the resin and spun 1500-x g for 2 minutes. The eluate was collected. RNA 
was degraded by adding 3 \i\ of 0.5 M NaOH. The solution was mixed well and incubated at 70 
°C for 15 minutes. The solution was neutralized by adding 3 jxl of 0.5 M HC1 and mixed well. 
Then, 450 \il lxTE, pH 7.5 was added to the neutralized sample, and the sample was transferred 

15 onto a Microcon-30 concentrator (prior to use, 500 |xl lxTE was spun through the column to 
remove residual glycerol). The samples were spun at 14000 x g in a micro centrifuge for 12 
minutes, and the volume was checked. Spinning was continued until the volume was reduced 
to 5 (xl. The labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning 
at 1000 x g for 3 minutes. The Microcon filter was checked for proper elution. 

20 Comparative Hybridization of the Splice Variant Microarrays and Post-hybridization washes 

The Cy3 and Cy5-labelled cDNA samples, respectively, were combined in one tube. 
The following was added: 3.75 \il 20x SSC (3x SSC final, pass through 0.22 ^filter prior to use 
to remove particulates), yeast tRNA (1 ng/(il final), 0.625 \il 1 M HEPES, pH 7.0 (25 mM final, 
pass through 0.22 ^filter prior to use to remove particulates), 0.75 \x\ 10 % SDS (0.3 % final), 

25 and DEPC-water to 25 [il final volume. The labelled cDNA target sample was filtered in 
Millipore 0.22 \i filter spin column (Ultrafree-MC, Millipore, USA) according to the 
manufacturer's instructions, followed by incubation of the reaction mixture at 100 °C for 2-5 
minutes. The cDNA probes were cooled at room temperature for 2-5 minutes by spinning at 
maxium speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was 

30 carefully placed on top of the microarray spotted on Immobilizer™ MicroArray Slide, and the 
hybridization mixture was applied to the an*ay from the side. An aliquot of 30 jiL of 3xSSC 
was added to both ends of the hybridization chamber, and the Immobilizer™ MicroArray Slide 
was placed in the hybridization chamber (DieTech, USA). The chamber was sealed watertight 
and incubated at 65°C for 16-18 hours submerged in a water bath. After hybridization, the slide 
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was removed carefully from the hybridization chamber and washed using the following 
protocol. 

The slides were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room 
temperature until the cover slip falls of into the washing solution, then in lx SSC pH 7.0 (150 
5 rnM NaCl, 15 mM Sodium Citrate) at room temperature for one minute, then in 0.2 x SSC, pH 
7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for one minute, and Finally in 
0.05 x SSC (7:5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, followed by drying of the 
slides by spinning at 1000 x g for 2 minutes. The slides were stored in a slide box in the dark 
until scanning. 
10 Microarray data analysis 

The splice variant microarray was scanned in a ScanArray 4000XL confocal laser 
scanner (Packard Instruments, USA). The hybridization data were analyzed using the GenePix 
Pro 4.01 microarray analysis software (Axon, USA). 

In the data analysis, the experimental variation in the labelling efficiency between the 
15 two fluorescent dyes was normalized (scaled) as follows. The average signal intensities from 
the "exonl" and "exon3" internal capture probes (Table 6), were used to calculate 
normalization factor of 2.75. This factor was multiplied to the signal intensity values from the 
Cy-3 target. 

Analysis of the data from the specific detection of the two recombinant splice variants in 
20 a complex RNA pool demonstrates that the merged capture probes containing a LNA block 
have significantly higher signals and a very low level of cross-hybridization, compared to the 
DNA capture probes (Figs. 20A and 20B). In addition, the specific detection of the two 
artificial splice variants #1 and #2 is validated with the results from LNA-modified 
oligonucleotide capture probes. Capture probes of other lengths and/or with other LNA 
25 substitution patterns can be used similarly. In contrast, the corresponding DNA oligonucleotide 
capture probes fail to detect splice variant #1 (Fig. 20B). 

Example 6: The Use of LNA-modified Oligonucleotides in Microarravs Provides Significantly 
Improved Sensitivity in Expression Profiling 
30 This example demonstrates the advantages of using LNA oligonucleotide microan-ays in 

gene expression profiling experiments. Capture probes for the Saccharomyces cerevisiae gene 
5VV75 (YDR146C) were designed as 50-mer standard DNA and two different LNA-modified 
oligonucleotides with LNA substitutions at every second or every third nucleotide position, 
respectively, for comparison (Table 7). To assess the sensitivity of DNA versus LNA capture 
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probes, hybridizations with different amounts of biotin-labelled antisense oligonucleotides in a 

10-fold dilution series were performed. 

Design and Synthesis of the LNA Capture Probes 

To design capture probes, regions with unique mRNA sequence of the selected target 
5 genes were identified. Optimized 50-mer oligonucleotide sequences with respect to T m , self- 
complementarity, and secondary structure were selected. LNA modifications were incorporated 
to increase affinity and specificity. The biotin-labelled antisense DNA target oligonucleotide 
corresponds to the reverse complement sequence. 
Printing of the LNA Microarrays 

10 The microarrays were printed on Immobilizer™ MicroArray Slides (Exiqon, Denmark) 

using the Biochip One Arrayer from Packard Biochip technologies (Packard, USA). The arrays 
were printed with a spot volume of 2x300 pi of a 10 ^iM (final concentration) capture probe 
dilution. Four replicas of the capture probes were printed on each slide 
Hybridization with Biotin-labelled Antisense Oligonucleotide 

15 The arrays were hybridized overnight using the following protocol. The desired amount 

. of biotin-labelled oligonucleotide was combined in one tube followed by addition of 3 jllL 
20xSSC (3xSSC final), 0.5 [iL 1 M HEPES, pH 7.0 (25 mM final), 25 \ig yeast tRNA (1.25 
M-g/^iL final), 0.6 ^iL 10% SDS (0.3% final), and DEPC-treated water to 20 (iL final volume. 
The biotin-labelled target sample was filtered in a Millipore 0.22 micron spin column according 

20 to the manufacturer's instructions (Millipore, USA), and the probe was denatured by incubating 
the reaction at 100°C for 2 minutes. The sample was cooled at 20-25°C for 5 minutes by 
spinning at maxium speed in a microcentrifuge. A' LifterSlip (Erie Scientific Company, USA) 
was carefully placed on top of the microarray spotted on Immobilizer™ MicroArray Slide, and 
the hybridization mixture was applied to the array from the side. An aliquot of 30 fiL of 3xSSC 

25 was added to both ends of the hybridization chamber, and the Immobilizer™ MicroArray Slide 
was placed in the hybridization chamber. The chamber was sealed watertight and incubated at 
65°C for 16-18 hours submerged in a water bath. After hybridization, the slide was removed 
carefully from the hybridization chamber and washed using the following protocol. The 
Lifterslip coverslip was washed off in 2xSSC, pH 7.0 containing 0.1% SDS at room 

30 temperature for 1 minute, followed by washing of the microarrays subsequently in l.OxSSC, pH 
7.0 at room temperature for 1 minute, and then in 0.2xSSC, pH 7.0 at room temperature for 1 
minute. Finally, the slides were washed for 5 seconds in 0.05xSSC, pH 7.0. The slides were 
then dried by centrifugation in a swinging bucket rotor at approximately 200 G for 2 minutes. 
To visualize the biotin containing duplexes, an aliquot of 40 of the 2 jig/ml streptavidin-Cy3 
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in lxSSC+0.05 % Tween solution was applied to the slide as described for the hybridization 
mixture above. The slide was incubated in a humidified chamber for 1 hour at room 
temperature. The coverslip was washed off in lxSSC+0.05 % Tween for 1 minute, followed by 
wash in 0.2xSSC+0.05 % Tween for 1 minute and then 10 seconds in MilliQ-water. The slide 
5 was dried by centrifugation in a swinging bucket rotor for 2 minutes at 200 G. 
Data Analysis 

Following washing and drying, the slides were scanned using a ScanArray 4000XL 
scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the 
GenePix™ Pro 4.0 software package (Axon, USA). 
10 Results 

Incorporation of LNA nucleotides at every second or third nucleotide position in 
standard 50-mer expression array oligonucleotide capture probes results in a 2-7-fold increase 
in fluorescence intensity levels using an unsaturated target concentration and hybridizing under 
standard stringency conditions (Fig. 21). Thus, it can clearly be concluded that the LNA 
15 oligonucleotides are more sensitive in expression profiling compared to DNA oligonucleotides. 

Table 7. DNA and LNA-modified SWI5 (YDR146C) oligonucleotide capture probes. LNA 
modifications are depicted by uppercase letters in the sequence; w mC" denotes LNA methyl 
cytosine. 

Oligo Name Sequence 

YDR 1 46C-50 tgggaatggaacggggattatggtttcgccaa tgaaaactaatcaaaggt 

YDRI46C-50_LNA2 TgGgAaTgGaAcGgGgAtTaTgGtTtmCgmCcAaTgAaAamCtAaTcAaAgGt 
YDR 1 46C-50_LNA3 TggGaaTgg AacGggGatTatGgtTtcGcc AatGaaAacTaaTcaAagGt 

20 

Example 7: The Use of LNA-modified Oligonucleotides in Microarrays Provides Significantly 

Improved Sensitivity in Comparative Genome Hybridization (CGH). 

This example demonstrates the advantages of using LNA oligonucleotide microarrays in 

Comparative Genome Hybridization (CGH) experiments. Capture probes for all 23 exons of 
25 the Menkes gene (ATP7A) were designed as 50-mer standard DNA and different LNA/DNA 

mixmer oligonucleotides, respectively, for comparison (Fig. 25). The C6-amino-linked capture 

probes were applied to Immobilizer slides and hybridized with patient DNA samples labelled 

with a Cy3 fluorescent dye. 

Design and synthesis of the LNA capture probes 
30 To design the capture probes, regions comprising individual exons of the Menkes gene 

were identified. The optimal 50-mer oligonucleotide sequences with respect to T m , self- 
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complementarity, and secondary structure were selected for each exon. LNA modifications 
were incorporated to increase affinity and specificity. A software tool "OligoDesign", which 
automatically designs capture probes that are optimized for sequence specificity, T m , self- 
complementarity, secondary structure, and LNA modifications was used for oligonucleotide 
5 design. 
Results 

Fluorescent Cy3 labelled patient genomic DNA was hybridized to microarrays spotted 
with the CGH capture probes listed in Fig. 25. Compared to DNA capture probes, capture 
probes with LNA in every second position (LNA-2) had a significantly better capture rate of 
10 non-amplified labelled genomic patient DNA as shown in Figs. 22-24. Capture probes of other 
lengths and/or with other LNA substitution patterns can be used similarly. 

Example 8: Expression Profiling of Stress and Toxicity in Caenorhabditis elegans using LNA 
Qgligonucleotide Microarrays 

15 This example demonstrates the use of the Exiqon C. elegans LNA tox oligoarray in 

gene expression profiling experiments in the nematode Caenorhabditis elegans. The C. elegans 
tox oligoarray monitors the expression of a selection of 110 genes relevant for general stress 
response and for the metabolism of toxic compounds. Two different capture probes for each of 
these target genes were designed and included in the LNA tox array. In addition, the C. elegans 

20 LNA tox oligoarray contained capture probes providing control for cDNA synthesis efficiency 
and the developmental stage of the nematode. Capture probes for constitutively expressed 
genes for data set normalization were also included on the C. elegans LNA tox oligoarray. 

Cultivation of C. elegans Worms 

For all cultures, the sample was divided into two, and one half of the sample was used 

25 as the control, the other was used as the treated sample. Worm samples were harvested and 
sucrose cleaned by standard methods. For heat shock treatment, the heat shock sample was 
added to S-media preheated to 33°C in a 1 L flask suspended in a water bath at 33°C, the other 
sample was added to a 1 L flask with S-media at 25°C. Both samples were shaken at 
approximately 100 rpm for an hour. For Lansoprazole treatment, 0.5 mL of 10 mg/mL 

30 Lansoprazole (Sigma) in DMSO was added to each 500 mL volume of S-media culture after 28 
hours of growth from LI. At the same time, 0.5 mL of DMSO was added to the control. 
Incubation was for 24 hours. Samples were then harvested by centrifugation at 3000x# 
suspended in RNALater™ (Ambion) and immediately frozen in liquid nitrogen. 
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RNA Extraction 

RNA was extracted from the worm samples using the FastRNA® Kit, GREEN (Q-BIO) 
essentially according to the suppliers' instructions. 

Design and Synthesis of the LNA Capture Probes 
5 To design the capture probes, regions with unique mRNA sequence of the selected 

target genes were identified. The optimal 50-mer oligonucleotide sequences with respect to T m , 
self-complementarity, and secondary structure were selected. LNA modifications were 
incorporated to increase affinity and specificity. 
Printing of the LNA Microarrays 
10 The microarrays were printed on Immobilizer™ MicroArray Slides (Exiqon, Denmark) 

using the Biochip One Arrayer from Packard Biochip technologies (Packard, USA). The arrays 
were printed with a spot volume of 2x300 pi of a 10 |iM capture probe solution. Four replicas 
of the capture probes were printed on each slide. 

Synthesis of Fluorochrome Labelled First Strand cDN A from Total RNA 

15 15 \ig of C elegans total RNA was combined with 5 jig oligo dT primer (T20VN) in an 

RNase free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC-water 
to 8 (XL. The reaction mixture was heated at +70°C for 10 minutes, quenched on ice 5 minutes, 
spin 20 seconds, followed by addition of 1 fiL SUPERase-In™ (20U/fiL, Ambion, USA), 4 |XL 
SxRTase buffer (Invitrogen, USA), 2 \iL 0.1 M DTT (Invitrogen, USA), 1 \iL dNTP (20mM 

20 dATP, dGTP, dTTP; 0.4 mM dCTP in DEPC-water, Amersham Pharmacia Biotech, USA), and 
3 \iL Cy3™-dCTP or Cy5™-dCTP (Amersham Pharmacia Biotech, USA), First strand cDNA 
synthesis was carried out by adding 1 |lL of Superscript™ II (Invitrogen, 200 U/mL), mixing, 
and incubating the reaction mixture for 1 hour at 42°C. An additional 1 \iL of Superscript™ II 
was added, and the cDNA synthesis reaction mixture was incubated for an additional 1 hour at 

25 42°C; the reaction was stopped by heating at 70°C for 5 minutes, and quenching on ice for 2 
minutes. The RNA was hydrolyzed by adding 3 pJL of 0.5 M NaOH, and incubating at 70°C for 
15 minutes. The samples were neutralized by adding 3 jiL of 0.5 M HC1, and purified by 
adding 450 ^iL lxTE buffer, pH 7.5 to the neutralized sample and transferring the samples onto 
a Microcon-30 concentrator. The samples were centrifuged at 14000xg in a microcentrifuge for 

30 -8 minutes, the flow-through was discarded, and the washing step was repeated twice by 
refilling the filter with 450 |il lxTE buffer and by spinning for -12 minutes. Centrifugation 
was continued until the volume was reduced to 5 \iL 7 and finally the labelled cDNA probe was 
eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 3 minutes. 
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Hybridization with Fluorochrome-labelled cDNA 

The arrays were hybridized overnight using the following protocol. The Cy3™ and 
Cy5™-labelled cDNA samples were combined in one tube followed by addition of 3 fiL 
20xSSC (3xSSC final), 0.5 jiL 1 M HEPES, pH 7.0 (25 mM final), 25 \ig yeast tRNA (1.25 
5 ^g/fiL final), 0.6 jjlL 10% SDS (0.3% final), and DEPC-treated water to 20 final volume. 
The labelled cDNA target sample was filtered in a Millipore 0.22 micron spin column 
according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by 
incubating the reaction at 100°C for 2 minutes. The sample was cooled at 20-25°C for 5 
minutes by spinning at maximum speed in a microcentrifuge. A LifterSlip (Erie Scientific 

10 Company, USA) was carefully placed on top of the microarray spotted on Immobilizer™ 
MicroArray Slide, and the hybridization mixture was applied to the array from the side. An 
aliquot of 30 ^.L of 3xSSC was added to both ends of the hybridization chamber, and the 
Immobilizer™ MicroArray Slide was placed in the hybridization chamber. The chamber was 
sealed watertight and incubated at 65°C for 16-18 hours submerged in a water bath. After 

15 hybridisation, the slide was removed carefully from the hybridization chamber and washed 
using the "folio wing protocol. The Lifterslip coverslip was washed off in 2xSSC, pH 7.0 
containing 0.1% SDS at room temperature for 1 minute, followed by washing of the 
microarrays subsequently in l.OxSSC, pH 7.0 at room temperature for 1 minute, and then in 
0.2xSSC, pH 7.0 at room temperature for 1 minute. Finally, the slides were washed for 5 

20 seconds in 0.05xSSC, pH 7.0. The slides were then dried by centrifugation in a swinging 
bucket rotor at approximately 200 G for 2 minutes. The slide was then ready for scanning. 
Data analysis. 

Following washing and drying, the slides were scanned using a ScanArray 4000XL 
scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the 
25 GenePix™ Pro 4.0 software package (Axon, USA). The data in each image was normalized so 
that the ratio of means of all of the features is equal to 1 . 

Results 

Use of LNA-modified oligonucleotide capture probes in the C elegans LNA tox 
30 oligoarray clearly allows the identification of distinct expression profiles for C. elegans genes 
relevant for general stress response and for the metabolism of toxic compounds. 
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Table 12. Expression profiling using LNA Oligonucleotide Microarrays. Log2 transformed 
fold of changes for five selected genes in the two expression profiling experiments. 
Protein name 

(clone name^ Heat shock Lansoprazole 


HSP70 (F44E5.4/5) 4. 1 1 nd 

CYP37 A (F01 D5.9) nd 0.98 

Ubiquitin (M7.1) 0. 16 -0.12 

Histone 1Q (C01B10.5) -1.49 nd 

HSP90 (C47E8.5) nd -1.17 


5 Table 13. LNA-modified oligonucleotide capture probes. LNA modifications are depicted by 
uppercase letters in the sequence; "mC denotes LNA methyl cytosine. 


Oligo Name 

CEABC_C34G6.4__u293_LNA3 
CEABC_C34G6.4_u375_LNA3 
CEABCLF57C1 2.4_u1 5.LNA3 
CEABC_F57C 1 2.4_u480_LNA3 
CEABC_F57C12.5_u1 1 1_LNA3 
CEABC_F57C 1 2.5_u444 JJM A3 
CEABC_K08E7.9_d8_LNA3 
CEABC_K08 E7.9_u5 1 _LN A3 
CEABC_Y39D8C. 1 _u37_LNA3 
CEABC_Y39D8C.1_u422J_NA3 
CEADH_H24K24.3a_d3_LNA3 
CEADH„H24K24.3a_u50_LNA3 
CEAPEX_R09B3.1_u191_LNA3 
CEAP EX.R09B3. 1 _u37_LN A3 
CEAPO_C35D1 0.9_u1 5J_NA3 
CEAPO_C35D1 0.9_u609_LNA3 
CEAPO_C48D1 .2_U1 76_LNA3 
CEAPO_C48D1 .2_u23_LNA3 
CEAPO_F20C5. 1 _u453_LNA3 
CEAPO_F20C5. 1 _u96 J_N A3 
CEATPase_B0365.3_u31J.NA3 

CEATPase_B0365.3_u386_LNA3 


Sequence 

TgcmCatTgcAcgGgcActTgtTcgAtcTccTtcTgtTttActTttGgaTg 

TcaTtcTagGatTgcmCagAtgGttAtgAtamCtcAtgTcgGagAgaAagGa 

mCk;aAtgTtgTttAatTggTtgTaaTgtmCttGatGacmCtgnriCatAalmCatAt 

mCacAagAtcmCtgTgtTgtTctmCcgGaamCaaTgaAaaTgaActTagAtcmCa 

TacTtgTtcTcgAcaAagGttGtgTagmCcgAgtTtgAcamCtcmCgaAgaAa 

TgaActTggAtcmCclTctTtgmCatTtaGcgAtgAtcAaaTttGggAagmCg 

TcaTtaAnTtgTgtAgcTttmCttTctmCgaTttTtgmCacGatmCttTccmCc 

AggGtgmCctActAcaAacTgamCccAaaAgcAgaTgamCcgAgaAgaAatAa 

AttGaaAgcGacGcgGaaAgtGccAtgTatTtcTaaTttTgtTttmCttTa 

TtgTcaGcaTatmCaaGagTagAtaTggAagTggAtamCacTctGctAatmCc 

mCacmCttAttGcgTtcAatTttTgtTtcmCacmCtamCtamCtamCgaAtamCgtTg 

TcamCaaGggAgaGagTctGcgGtcGgtGctGgcGttmCgaGaaAatAtaAc 

mCatGcaTccmCgamCgaGaaGaaGtamCtcAttTtgGagTtaTctGgcGaaTt 

GacmCatGctmCcgGtcGtcAtgmCaaAtcGacTtcTaaAttGctTctGatTa 

TtgmCatGctGttAaaAccTatmCgtGtamCaaTatTgcmCtgTatAttmCccrnCt 

TggmCacAgcTtaAtaAcaAatTggAaaGtcGagGatTagTcgGtgTtgAa 

GacAcamCgcAaaGgaTatGgaTgtTgtTgaGctGctGacTgaAgtmCaaTa 

AgcAcgAaamCtcTgcmCgtmCtaAaaTtcActmCgtGatTcaTtgmCccAatTg 

AtgGtcAtamCtcTaaAatGggmCagAacTtcAacmCaaAtcAttmClcGtcAg 

AacmCcgAgcTtgmCcgnnCaaAgtGcaAgaAaaTtaTagAacGaaTgaAacAg 

GgaTggGtcGagmCgtGagAccTacTacTaaAgaAcaGctTgtGaaTctTt 

mCaamCgtTctmCgaTtcmCtamCggAcaAgaAtgGacmCtaTgciTiCaamCagAaa 


Ga 

CEATPase_C 1 7H 1 2. 1 4_u356_LN A3 TgcTcgTtaTccAgcTatTttG aaGggActTgtmCatGca AggAclTctTc 
CEATPase_C17H12.14_u89_LNA3 mCcgTttAgaGctTatTgcTaamCcaGatTgtmCccAcaAgtmCagAacAgcTc 
CEATPase_F55F3.3_u2 1 5 JLNA3 TgamCggAcgmClamCtamCccAtaTgtAttTgtTcc AtcTtamCcaGcaAccAa 
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CEATPase_F55F3.3_u275_LNA3 

CEATPase_Y49A3A-2_u103_LNA3 

CEATPase_Y49A3A.2_u272_LNA3 

CECALRJV38A1 0A.5_u23B_LNA3 

CECALR_Y38A1 0A.5_u296_LNA3 

CECAT_Y54G 1 1 A.5b_u1 37_LNA3 

CECAT.Y54G1 1 A.5b_u189J.NA3 

CECC_C03D6.3_u275_LNA3 

CECC_C03D6.3_u430_LNA3 

CECC_C07G2.3_ci9_LNA3 

CECC_C07G2.3_u44_LNA3 

CECC_Y46G5A.2_u331_LNA3 

CECC_Y46G5A.2_u385_LNA3 

CECoA_C29F3. 1 _u316,LNA3 

CECoA_C29F3. 1 _u392_LNA3 

CECoA_F08A8.4__u1 094_LNA3 

CECoA_F08A8.4_u1 260_LNA3 

CECoA_F59F4. 1 _u1 09_LN A3 

CECoA_F59F4.1 _U424_LNA3 

CECoA_Y25C1 A.13_u1 15_LNA3 

CECoA_Y25C1A.13_u451_LNA3 

CECOL_C27H5.5_u493_LNA3 

CECOL_C27H5.5_u680_LNA3 

CECOQ_ZC395.2_u1 99_LNA3 

CECOQ_ZC395.2_u400_LNA3 

CECRYZ_F39B2.3_u171JJMA3 

CECRYZ_F39B2.3_u222_LNA3 

CECyclin_R02F2.1 a_u24_LNA3 
CECyclin_R02F2.1 a_u312_LNA3 
CECyclin_ZC1 68.4_u203_LNA3 
CECyclin_ZC1 68.4_u273_LNA3 
CECYP_B021 3. 1 5_u 1 33_LN A3 
CECYP_B021 3. 1 5_u202_LN A3 
CECYP_B0304.3_u38_LNA3 
CECYP_B0304.3_u89_LNA3 
CECYP_C03G6. 1 4_u706_LN A3 
CECYP_C03G6. 1 4_u768_LNA3 
CECYP_C03G6. 1 5_d9_LNA3 
CECYP_C03G6. 1 5_u 1 48 J.NA3 
CECYP_C06B3.3_u1 02 _LNA3 
CECYP_C06B3.3_u474_LNA3 
CECYP_Cl2D5.7_u399_LNA3 
CECYP_C12D5.7_u65_LNA3 
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AgcTacTtcAttmCgamCaaGgaAcaTctmCggAaaAgtmCaaGtamCatmCccGg 

AaaTtcAagGatmCcaGttGccGatGgtGaaGccAagAttmCgcAagGatTa 

mCgaTcgTttmCtgmC^AttmCtamCaaGacTglmCggTatGctmCaaGaaTatGa 

TcaGgaAcgAtcTttGacAacAltAtcAtcAccGacTctGttGagGagGc 

TgaActmCtamCtcTtaTgaAagmCtgGggAgcmCatmCggAttmCgaTttGtgGc 

GaamCttTgcAggGccGctmCggGgaAtgTcaTgaTttmCatTatTaaGggAa 

GtcAatTctGggAgaAggTgtTggAtamCcgGggmCtcGggAgaGaaTgtGc 

AtgTaaAgaAggAatGctTccmCgaAtgGatTggAtaTttAttTgtmCcaGa 

GgamCcgAaaTttGtgmCagmCatGtcGgamCacGaaAttGatGgtmCtcAttTt 

mCagAcamCgaAggTtamCgaTagAtaAccAtcTctmCaaAgtmCtaTcgAccTc 

mCgamCgaTgtGcgTgtTccTgamCgaTgaAagAatGggAtaTtaAgaAaamCc 

TtgTgcTccAtcGctGclmCcgmCttAcaGacTtgAcaAcgmCtcAccTltGc 

AatGagmCggTtgTgcmCgtGtgAcgTcamCttrriCgtrnCacAgtGttGctmCtamCt 

AaaTlgAcamCcaAtcAaaTctGtcTcaTctmCctGagGacrnGgtmCaamCttnriCg 

AatmCttTgtGtamCggAgaTggGgcAaaAggmCagmCaaGaaAgtAaamCcaAg 

AggAcaAggGgcActActGgcAcaGgcTttGatTatTgcAgtGagAtaTt 

TtaAtgGagGtgAcaAtgGgtTccTtgGatTcgAtaAatTccGagTgcmCc 

GctmCnmCtcmCagTggGctmCaaAatAgtmCaamCtcAacAgaTcgGaaGnmCt 

AaaGctTcgAgaTggmCacGttmCgtmCtgTatmCtcGtgAagAacTtaTtgmCa 

GatTcgnnCtgAacTttAtcAagAcgTggAatAtgAgcmCagmCtcmCtgTcgAc 

GatmCttAtcAccGcgTgcGatAttmCgaGtaGctTcamCagGatGcgAttTt 

GgaAagGaaGgaTccAttmCtcAgcTctGcamCttmCcamCcaTcaGagmCcaTg 

TggAtamCaaGgaGggAtcTggmCagTggTggAtcTggAagTggTggAtaTg 

TtgAaaGaamCtcmCttGccGacGatmCctGaaAcamCacAaaGaaTtgmCtgAa 

AtgTggGatGagGagAaaGaamCatTtaGatAcaAtgGaaAgaTtaGctGc 

AggmCtgAgcTctTggActTtgGcaTcaAcaTtgTctmCatTctTgaAggAa 

TtaTggTtamCag AagG agmCtgTttAcg GigTagmCatTggGaaTgtm Cttm Cc 

mCacTtcAacm CaamCtcmCgtG ttAatmCaa Gca AgcmCgcmCacmCatm CtaAtg 

Ag 

TctmCatTgcTcgTcgAggmCtamCcaAcaAacActGgcAatAccmCaaTtaAt 

TaaGaaAgtmCatTgaGgaTgcTglmCgcTttGctmCgcmCgaAgtmCtcGtaTa 

AagTtcAtcmCtgTtgAcgGaaTcgAggmCggAgaAtgmCtgTatmCggTcaTt 

AcaGgaAatAtgAttTtgGatTtcGatTttGaaTcgGttGgtGctGccmCc 

GctGagmCtgTatTtgGctAgtGaaAtgTgtGttTttGatActTtaAatGa 

AcgAggTttGgaTcamCaaTcaGaaTtcTgtGaaAtaAgcGttTttTggGa 

AgtTctmCggTctAacAgtGtcTccmCgtTgaAtaTtcTtgTaaAatmCacAc 

AtgAcc ActmC aaAatActGctAaaAgaTttGcaGcg GcaG aaGccGttAa 

TtgAtaTggmCtgTacmCtgTatGgtTttTgaGgamCgtTttTtaGgaGtcGa 

AttTatTcaTtcAtcmCatGtaAacTgtAtaTttTgaAttTgtGttGtaAa 

Gcx:AaaGcaGaaT1gTatTtgAtcTtcGgtAacmCttmCtcmCttmCgcTacAa 

AttTtgAatmCttmCtgGgaAaaTgcmCatmCcaniCtcGagAaamCcgTtcmCgtTt 

mCtaAcgGagGatmCtcGccAatTatmCttTgaGagAcaAaamCtgAaamCtcmCt 

AtcTagTccmCaaTgaAtcTccmCacAtgmCtgTtamCtcGtgAtgTtcAacTc 

TttTgcTttmCatmCgcAaaAgcTcaAgaTtamCacAtgTcaGgtmCaaGccAa 


UN A2 1 fS K A/MSL 


5/14/2003 


CECYP_C45H4. 1 7_u27J_NA3 

CECYP_C45H4. 1 7_u598JJMA3 

CECYP_C45H4.2_u1 10J.NA3 

CECYP_C45H4.2_u429_LNA3 

CECYP_C49C8.4_u363_LNA3 

CECYP_C49C8.4_u883_LNA3 

CECYP_C49G7.8_d6_LNA3 

CECYP_C49G7.8_u795J_NA3 

CECYP_F01 D5.9_u374_LNA3 

CECYP_F01 D5.9_u46_LNA3 

CECYP_F08F3.7_u25J.NA3 

CECYP_F08F3.7_u401_LNA3 

CECYP_F1 4F7.2_u397_LN A3 

CECYP_F14F7.2_u68_LNA3 

CECYP_F42A9.5_u435_LNA3 

CECYP_F42A9.5_u55_LNA3 

CECYP_K07C6.3_u3JJMA3 

CECYP_K07C6.3_u354_LNA3 

CECYP_K07C6.4_u 1 1 8_LN A3 

CECYP_K07C6.4_u87_LNA3 
CECYP_K07C6.5_u7_LNA3 
CECYP_K07C6.5_u99_LNA3 
CECYP_K09A1 1 .3_u362 J-NA3 
CECYP_K09A1 1 .3_u48_LNA3 
CECYP_K09A1 1.4_u238_LNA3 
CECYP_K09A1 1.4_u68_LNA3 
C ECYP_K09D9 .2_u 1 5 1 _LN A3 
CECYP_K09D9.2_u866_LNA3 
CECYP_T1 0B9. 1 0_u41 0_LNA 
CECYP_T1 0B9. 1 0_u56_LNA 
CECYPJT1 0B9.7_u1 02J.NA3 
CEC YP_T1 0B9.7_u267 J_NA3 
CECYP_T1 9B 1 0. 1 _u 1 00_LNA3 
CECYP_T19B10.1_u319_LNA3 
CECYP_Y49C4A.9_u121_LNA3 
CEC YP_Y49C4A.9_u41 3_LNA3 
CECYP_ZK177.5_u394J.NA3 
CECYP.ZK1 77.5_u445 J.NA3 
CEDAO_C47A1 0.5_d9_LNA3 
CEDAO_C47A1 0.5_u269_LNA3 
CEDC_C01A2.3_u373_LNA3 
CEDC_C01 A2.3_u96_LNA4 
CEDC_C34F6. 1 _u301 _LN A3 
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mCcgmCgamCttTaaAgaGaaGatmCatAaaTttGcaTtgTttTttGltTgtAt 

mCgaGggTgaTtcGgaGacTrtmCagTaaTgtmCcaActTtcAaaTgtTtgmCa 

TagAtamCaaGatAcaTccmCtcAaaAgaAggmCctAccGtcAatGgcmCaaAg 

TcaAcgmCgtmCtaTaaAtgAatmCacAacGagGtaTcaAcaTtcTccmCccTg 

AtgmCtgAtgTtgAaaTtgmCtgGctAccGtaTtcmCaaAagAtamCtgTaaTc 

AtgAatmCcaTggmCttGgamCatmCtcmCcgTttTtcAagGgaTatAaaAatGt 

AtgmCaamCgaAttAgtGaaAaaTtcAtcmCtgGaaTaaAaaAtaAttmCtaAa 

AtcGctAcgAcaAtcTttmCcgAtgmCctTcgAagTttmCgaAagmCttTctmCt 

GagGtcGgtGgaGgaGgaAgtGgaAatTgamCggmCaaAatmCctGccmCaaGg 

mCccTctTtgGgaTttmCcamCtcAagTttActGttmCggmCagmCagTgaTatAa 

GagTtgGttmCcamCagAatGctTagGacGttTaaAttmCgtmCacAaamCttTt 

mCaaTatGgtTccmCatTttAgcAacTcaTatGaamCacAgaAgaTgtmCctTg 

GaaAaaGgcGtcGacAttTtaTgtGacAcgTggAcamCttmCacTatGacAa 

TaaTtgAatTacGggTctTttGtamCatAttAatTttAgtAlamCttTgtGa 

AtaTcaAtgmCaamCtaTtaAtgAatmCacAacGtcTtgmCcaAtcTtcTccmCg 

GgaGtgActAtgAaaGcaAagAgtTacmCgaTtgAaamCtgAaaGacAgamCa 

AatmCttTaaTgaTaaTttAtgGgaTctGtaTttmCtcTttmCtgTcaAtaAa 

Atg AgcmCcam CaaAtgTaaAagG atAcg AgaTtg Att m Cgg GaamCagTcaTg 

AtcmCtgmCgaTatGacAttAagmCcamCatGgtTctGaamCctTcaAcaGaaGa 

mCtgAacmCttmCaarnCagAagAtaAacTtcmCgtAtaGcgmCtgGaaAaamCtcmC 

t 

AttTaaAggAatTcamCagmCtcAaaAaaTaaTaamCtamCcgGttmCagAgaTt 

AatTtgAgcmCacAtgGcaAgtTatmCaamCagAggAgannCaaTgcmCgtAcaGt 

TgamCatTctActTaaAggGaaGaaAatAccAacTggTacmCctTgtAttTg 

TcamCcamCaaAgcmCatAcaTatGcgAgcTagTtcmCtcAggmCtgmCttAaamCc 

TtcGacAaaActAttTtgGaaAgaAcaAtcmCcaTtcAgtGtcGgcAaamCg 

TctGacAacAaaGccAtamCacGtgmCcgActAatTccAcaAtcAgcTagAa 

TtgGcaAaaGcaGaaTtgTatTtaAtcTttGgaAacmCtcmCttmCttmCgcTa 

TgaAtcTttmCaaActTatmCacTccTttTaaTacTacmCgtTccTgtTtgGa 

AttGagAttGtaTccAttGgcGtcTctTgtTcamCaaTcgAaaAtgTctmCa 

AacTgcTacTatTgcGccAtcAagTgtGctGctmCaaActTaaAtcmCagGt 

TtgAgamCagGaaAtaAgamCtaGaaTtcm CttTgaAacTggTgg GaaGtg mCt 

AagAtgTcaAagAatTcaAgcmCagAacGatGgtmCcamCcgAcgAgcmCatTa 

AttGaamCcaActmCtgAaaTatAatGacAcaAaamCcaTgtmCtgGaaGtgGt 

GgcAatGtgAcaAtaTctmCcaAtgGttmCttrnCacAgcAatmCatmCacGtgTt 

mCtaTtcAatmCgaTatTttAtcAcarnCcaTccAgtGctGgamCctmCcaTcaTt 

GlcTcaGagAtgTgtAaaTltActTccmCtgmCaaTttGttTcamCgcAacTa 

TtcmCgaAtgTttmCcaAttGggActGaaGttTcaAgaGtcAccmCagAaaAa 

GatmCcaGcaTctTccAagmCttAcaTtcmCtcmCgtGctTgtAtcAagGaaAc 

TttGaaAacmCtgTttTatTatTaaAatAgaTaaTtgAttAgtTctGtamCg 

AtamCgtTgcActGcaTccGgcTatGagGgaGccAaaAatmCttAggGgaGt 

GcamCttmCcaTtcAtcTctGcaGctAclAtgGctTtgGtgAcaAaaGttGg 

mCcgTccAaaAgaAtgmCkraTctmCacAagTctTgaAatmCttAtaAagGtaGt 

GagGgaTcaAcaGtaAccTcgTgcGgtAttGacAagGgaTgtmCcgGaaGg 


LNA21/SKA/MSL 


5/14/2003 


CEDC_C34F6.1_u450_LNA3 
CEDC_F33D1 1 3_u1 26_LNA3 
CEDC_F33D1 1 3_u14_LNA3 

CEDC_F46E1 0.2_u392_LNA3 
CEDC^F46E10.2_u54_LNA3 
CEDC_F56G4.2_u3B2J_NA3 
CEDC_F56G4.2_u82_LNA3 
CEDC.M1 62.2_u1 03 J.NA3 
CEDC_M1 62.2_u480 JJMA3 
CEDCLR10E4.11_u274_LNA3 
CEDC_R1 0E4. 1 1_u397_LN A3 
CEDC_T04C9. 1 _u32 1 _LN A3 
CEDCJT04C9. 1 _u64_LNA3 

CEDC_W02A2.3_u32_LNA3 

CEDC_W02A2.3_u374_LNA3 
CEDC_W05G11.3_ji153_LNA3 
CEDCJ/V05G1 1 .3_u51_LN A3 
CEDCJZK863.5_u256_LNA3 
CEDC_ZK863.5_u324_LNA3 
CEEPHX_Y55B1 BR.4_u161_LNA3 
CEEPHX_Y55B1 BR.4_u93_LNA3 
CEER_1 8S_u388_LN A3 
CEER_1 8S_u82_LNA3 
CEER_26S_u342_LNA3 
CEER_2SS_u38_LNA3 
CEFOXO_R1 3H8.1 b_u331_LNA3 
CEFOXO_R 1 3H8. 1 b_u393_LN A3 
CEG APDH_K1 0B3.7_y 2 1 _LN A3 
CEGAPDH_K1 0B3.7_u727_LNA3 
CEGBA_F1 1 E6.1a_u232_LNA3 
CEGBA_F1 1 E6.1a_u451_LNA3 
CEG LU_C02A1 2.1 _u264 J-NA3 
CEGLU_C02A1 2. 1 _u55_LN A3 
CEGLU__C46F1 1 ,2_u271_LNA3 
CEGLU_C46F1 1 .2_u45_LNA3 
CEGLU_F26E4.12_u109_LNA3 
CEGLU_F26E4.12_u480_LNA3 
CEGLU_R07B1 .4_u 1 66_LNA3 
CEGLU_R07B1.4_u38_LNA3 
CEGLU_T09A12.2_u220_LNA3 
CEGLLLT09A12.2_u335J_NA3 
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GatGgtTctTcgAtcGcaAacAaaAcaGatGtgmCtcmCatTtamCatAcgGa 

AtgGagAaaAtgGatmagAtgGagTtgmCagGaaGtgAtgGagmCtcmCagGa 

TgaAtcTccAtaAatTatTcaAtgTttmCcaAatAttTaaTttAtcAatTg 

Gc^CaamCacGgtAggAtcmCtaTggAacmCgtmCggAggAgcAggmCctmCggA 

9 

mCgtGacAacrnCtcTtaTttAttTctGtaAaamCtgAttmCgcmCaaActTtlGt 

GaaGcnTtcAaamCcaAatGagTtcrriCttmCccGgaAtcrnCcaAagAatAccAa 

AcaAtgAaaAgaGagGatGgaAagGaaAtcGaaGtcTctGttmCttGacGa 

GatGagGtamCatAacTtlGtgTgcAgtTatAggmCcaTctAcaGtamCctGc 

TtcmCatmCatmCacTaamCcgAttGtcmCtgAcaTtgAtgGccAaamCcaGggAa 

TcamCatTatmCgaAcaAgtActAgtAagmCatGctGtgAtgGagTgcmCgcTa 

mCacGgaGatmCacGacAtcAaaGcgGatTgcTtaGagTgtGgaAacmCgtmCt 

ActAtcTacGtgGcamCgtTggActmCatmCatmCgaTggGaamCgamCgtAtaAg 

TctmC^gGccAgtTcamCttTgtGatmCaaTctmCagAttmCgtmCcamCacAagAt 

mCtamCttmCogmCaaGaaGgcmCcgTcgTttmCtaAtcGatmCgaAcaTctmCacA 

c 

AtgGatGa!rnCgamCc«ActTgcmCacTgamCccAcaAtcm(^gmCacTcamCtam 
Cc 

AagAcgGagAggmCtgGagAgaAcgGtamCcgAtgGagAgcrnCagGaamCtgAt 

mCcamCccAggAggAggGatAcaAgaGaaGaaAgtAcaGatTctmCcaActAa 

AgtTtcAcamCttmCttTttGccGttTtgGttmCccGtlAtcAatmCcaTtgAt 

mCttTtaTatTctmCatmCaaTttGttTccTacTtgGtcAgcTgaGgaTcgTt 

TtcGgcAcaAatGgaGcaAaaGtaTcgTggTtaTtgTgaTgcGatTatTc 

mCta mCtaTgaAtgAgcTc amCtgGacTcaTttAtc AacTcgAgtmCaa AagmCk: 

GttGgcGaaTctTcgGgtTcgTatAacTtcTtaGagGgaTaaGcgGtgTt 

GaamCtg AttmCgaGaaGagTggGgam CtgTcgmCttm CgaG gtTtaAcgActTc 

TgtTatTgcGaaAgtAatmCctGctTagTacGagAggAacAgcGggTtcAa 

TgcAtamCgamCttGgtmCtcTtgGtcAagGtgTtgTatTcaGtaGagmCagTc 

TgtGrtmCagAatmCcamC^mCttmCgaAatmCcaAttGtgmCcaAgcActAacTt 

TtaAgamCggAacmCaaTtgmCtcmCacmCacmCatmCatAccAcgAgtTgaAcaGt 

AcaTtgmCtamCcaAggmCctAagmCcgmCttmCaaAttmCtcTaaGtcTgaAalGa 

GttGagTccAccGgaGtcTtcAccAccAtcGagAagGccAatGctmCacTt 

AgtAaaTtcmCttmCcamCgtGgaTctActmCgtGtgTtcAcaAagAtcGagGg 

GgtmCcaAtaAtgGgaGacTggTtcmCgcGcaGaaAgtTatGcaGatGatAt 

AgaAaamCttmCgtTggAccmCtgmCtaAggAgaAgtAttTcaAgcTtcTgaGc 

GagmCacmCcgAagmCtcAagmCcaTatTtgGaaAcaAgamCcaTacTctTcaAa 

GttAccmCtcTacAaaTctmCgcTtcAatmCcaAtgTtgTtcGcaGtcAccAa 

mCcgAagAgcTcgTtamCtaTgcGagGagGtgTgaAgcmCggAatAatTttTt 

AagTtcTtgGltGgamCgcGalGggAaaAttAtcAagAgaTtlGgamCcaAc 

AcgAttTcaAcgTcaAaaAtgmCtaAtgGtgAtgAcgTgtmCacTttmCggAt 

AccTggGttGatGttTttGcgGctGaaAgtTtcTccAagmCtcAttGatTa 

GaaGtamCgtmCtcmCcaAagAaaAgcTacmCccAgcTtaAggmCatTgcAcaAt 

GcgmCcaGatAtgTatTcaAagAtcGagGtaAatGgtmCagAacActmCatmCc 

AatmCtamCagGgaAaaAggAttTcgAgtTgcmCgcGttTccAtgmCaaTcaAt 


LNA2I/SKA/MSL 


5/14/2003 


CEGLUJT28A1 1 .1 1_u299_LNA3 
CEGLU.T28A1 1 .1 1_u54_LNA3 
CEGPD_B0035.5_u256_LNA3 
CEGPD_B0035.5_u478_LNA3 
CEHSP_C09B8.6_d8_LNA3 
CEHSP_C09B8.6_u286_LNA3 
CEHS P_C 1 2C8 . 1 _u 1 27_LN A3 
CEHS P_C 1 2C8 . 1 _u 1 53 1 _LN A3 
CEHSP_C47E8.5_u310_LNA3 
CEHSP_C47EB.5_u361_LNA3 
CEHSP_F26D10.3_u276_LNA3 
CEHSP_F26D1 0.3_u397__LNA3 
CEHSP_F43D9.4_u1 69_LNA3 

CEHSP_F43D9.4_u275_LNA3 

CEHSP_F44E5.4/5_u1 23_LNA3 

CEHSP_F44E5.4/5_u380__LNA3 

CEHSP_F52E1 .7_u1 75JJMA3 

CEHSP_F52E1 .7_u448_LNA3 

CEHSP_F54D5.8_u252JJMA3 

CEHSP_F54D5.8_u318_LNA3 

CEHUS_H26D21 .1 _u 1 1 7JJMA3 

CEHUS_H26D2 1 . 1 _u478_LNA3 

CEMRE_ZC302.1_u169_LNA3 

CEMRE_ZC302.1_U292_LNA3 

CEMTL_T08G5.10_d127_LNA3 

CEMTL_T08G5. 1 0_u45_LNA3 

CENAP_O2096.8_u356_LNA3 

CENAP_D2096.8_u70_LNA3 

CEPAI_F56D12.5_u241J_NA3 

C E P A l_F56D 12.5_u301_LNA3 

CEPDi_C07A12.4_u28_LNA3 

CEPDI_C07A12.4_u433.LNA3 

CEPDLC1 4B 1 . 1 ju 1 1 9_LN A3 
CEPDLC1 4B1 . 1 _u358_LN A3 
CEPGK.T03F1 .3_d9_LNA3 
CEPGKJT03F1 .3_u424JJMA3 
C EPON_E0 1 A2.7_u223_LN A3 
CEPON_E01 A2.7_u79_LNA3 

CEPPGB_F13D12.6_u44_LNA3 
CEPPGB_F13D12.6_u440_LNA3 
CEPPS_T1 4G 1 0. 1 _d2 J.NA3 
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AgaTggmCaaAgaAgcAtamCatAacTgaAacTctTccmCggGgaGctActAc 

TgaAtaAacGggmCcgAacTaaAtcrnCatTcgTcaGtgGaaAtgGgaAacAa 

GtcmCgtmCttmC^tGatGctTatGaamCgcmCtaTttmCtcGaaGtaTtcAtgGg 

TgtGgaAaaGctmCtcAacGagAagAaaGcaGaaGttmCgtAtamCaaTtcAa 

AtaTcgmCk:gmCkrtGctTccTcafrCcaAccmCgaAtaAcgmCaamCaaAaamCtlTa 

AagAgcmCcamCtcAtcAagGatGaaAgtGatGgaAagActmCtlmCgtmCtcAg 

mCaaGatAttTtaAcaAaaAtgmCatrnCaamCaaGaaGccmCaaTcaGgtTccGg 

mCttGggmCatTctGtamCggGatGctGtcAttActGtgmCctGcaTatTttAa 

AagAagmCatmCtcGaaAtcAacmCcaGacmCacGctAtcAtgAagAcamCttmCg 

AtgAaaGctmCaaGctmCttinCgtGatTccTctActAtgGgaTacAtgGccGc 

TtaAgcAgamCcaTtgAggAcgAgaAgcTcaAggAtaAgaTcaGccmCagAa 

mCgtmCttTccAagGatGacAttGaamCgcAtgGtcAacGaaGctGagAaaTa 

GtcGacTtgGctmCacAtcmCacAccGtcAtcAacAagGaaGgamCagAtgAc 

mCaaTctTgaGggAcamCgtTctrnCacmCatTgaGggAcamCcamCgaGgtmCaa 

Ga 

TcamCtaAaaTgcAccAatmQgGacAatmCttmCtgmCttmCtgmCtgGatGcgmCt 

TcaTga AgcTaa AcaAttmCg aAaaGgaAgaTggTga Aca Acg G gaAcgTg 

AagTatAacmCttmCcaAcaGggGtcmCgtmCcaGaamCaaAtcAagTccGaaTt 

TttAacmCatGgcmCgcAgaTtcTtcGatGacGtcGacTttGatmCgcmCacAt 

GcgTcgAaaAgaTctmCccTgaAgtmCtgrnCatTgamCtgGccTtgAtaTtaTg 

AcaTagTctTcgTcaTcaAggAlaAgcmCacAccmCgaAatTcaAgcGagAg 

Tcgm CcaAcam CtcGgamCacGtg mCcaAaaTgaAtaTcaTctmCaaAtcGaaTg 

GtcGaaGttAgaAatmCcaGaaGccGatAttGttTctmCatmCaaAttmCcaAt 

ActActmCgtGgaAgaTccAatAaaGttGttTcaAcgmCgamCaaAtcGalTc 

GgcAgtGaaGatGaaGtgGcaAatTctGatGaaGaaAtgGgaAgcAgtAt 

TtgTcaAcgAccAgaAgcAaaAatTatGggAatmCgcGatAaaAttmCaaGg 

GatGcaAgtGtgmCcaActGcgAatGtgmCtcAggmCtgmCtcAttAatTtgAa 

GacGatAtgTtcGatTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt 

GacGatAtgTtcGatTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt 

GagGtcGtcGtaAtcmCacAagGctmCcaAgaAagmCaaGtgmCtcGacAttTc 

GatActTttGgcAagmCtcGttmCcaAtcAagAagGagGlcAtcmCcaGatmCg 

GatGagGagGgamCacAccGagmCtcTaaAtcmCacAttmCcaAtamCagTtcAa 

mCtlAtgTccGaaGatAtcmCcaGagGatTggGacAagAacmCcaGtcAagAt 

TacmCccAgtmCgamCtaTgaTggAgamCagAaamCctmCgaGaaGttmCgaAgaA 

t 

mCtcGtcGccTccAacTtcAacGaaAtlGccmCltGatGaaAccAagActGt 

TtcTatTgtTtaTtcmCttGccmCaaTagTgtAttTgtAttTatTctTtcTc 

mCaaAtcmCatmCtcmCcaGtgGatTtcGtcAttGctGacAagTtcGccGagGa 

GttTctGatTcgAcamCttTatGgamCcaTctmCaaGttmCtgmCgaGttTcITt 

GggAaamCaaAtgAttGttGgtAcaGtaGccmCgcmCctGctAttmCacTgtGa 

mCgaGcamCatmCatmC^AtcGttmCctGttmCaamCaaGgcmCttmCtaAtcGttA 

9 

TgaTgaGagmCccAgtAacmCaaTtaTttGaamCcgTcaGgaTgtGcgTaaGg 
mCgtmCtaAtcGaaGaaGggGatmCgtGggrnCaaTcaTaamCtaAttAacmCttmCa 


LN/V2I/SKA/MSL 


5N 4/2003 


CEPPSJT1 4G1 0. 1 _u240_LNA3 

CEPRDX_R07E5.2_u405_LNA3 

CEPRDX_R07E5.2_u42_LNA3 

CEPYC_D2023.2_u256_LNA3 

CEPYC_D2023.2_u427_LNA3 

CERAD_F10G7.4_u1 69_LNA3 

CERAD_F10G7.4_u267_LNA3 
CERAD_F32A1 1 .2_u250_LNA3 
CERAD_F32A1 1 .2_u380_LNA3 
CERAD_T04H1 .4_u274_LNA3 
CERAD_T04H 1 .4_u375_LNA3 
CERAD_W06D4.6_u325_LNA3 
CERAD_W06D4.6_u34_LNA3 
CERAD.Y1 1 6A8C. 1 3_u289_LN A3 
CERAD_Y1 1 6A8C. 1 3_u59_LNA3 
CERADJY39A1 A.23_u22 1 _LN A3 
CERAD.Y39A1 A.23_u276_LN A3 
CER AD_Y4 1 C4A. 1 4_u509_LN A3 
CERAD_Y4 1 C4 A. 1 4_u731 _LN A3 
C ERAD_Y43C5A.6_u 1 3 1 _LN A3 
CERAD_Y43C5A,6_U429_LNA3 
CERFC_F31E3.3_u12B_LNA3 
CERFC_F31 E3.3__u55_LNA3 
CERPL_K1 1H12.2_d1_LNA3 

CERPLJO 1H12.2_u172_LNA3 
CERT_F36A4.7_u1396_LNA3 
CERT_F36A4.7_u2302_LNA3 
CERT_F36A4.7_u289_LNA3 
CERT_F36A4.7_u291 9_LN A3 
CERT_F36A4.7_u4269_LNA3 
CERT,F36A4.7_u5485_LNA3 
CESLC_F52F1 2. 1 a_u249_LNA3 

CESLC_F52F12.1a_u76JJMA3 
CESLC_K1 1 G9.5_u400_LNA3 
CESLC_K1 1 G9.5_u462_LN A3 
CESLC_Y32F6B. 1 _u1 79_LNA3 
CESLC_Y32F6B. 1 _u280_LNA3 
CESLCJV37A 1 C. 1 a_u1 04_LNA3 
CESLCLY37AtC.1a_u404_LNA3 
CESLC.Y70G1 0A.3_u383_LNA3 
CESLC__Y70G1 0A.3_u46_LNA3 
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mCaaTggmCtcmCagGtcTttmCtgmCtcTtcAtaTacTlcmCatTccGagTtgmCt 

GttmClcTlgGagmC^gAagTtgTcgmCgtGctmCgtGtgAttmCtcActTctmCt 

TcgmCtamCcaGcaAggAatActTcaAcaAggTcaAcaAgtGatmCacAcaGa 

AagGaaAttGtaActmCgcmCcaAgaGctmCtcmCcaGgtGtcmCgtGgamCatAt 

TtgActGgaTtgGagAttGcgGaaGaaGttGatGttGaaAtcGagAgtGg 

GccAagTctmCaaGcaAtaAgtGttGatmCaaTcaGagmCcaTacGgaGagAt 

AtaTtgAgamCttmC^gGacAagmC^gActTctmCatmC^gTcamC^gmCaamCtgm 

Cc 

GatmCcgmCagAgaAtcGagTatTtcmCtcTcgAgamCccAtgGatAtcAacTg 

TccGttAagAagmCtcActGgaAaaAcamCacGgcTcgAacGaaAttGgaAt 

AatTtgGatGagAgcAaaGtgGaaGgaAtgGctAtcGttTtgGcaGatAt 

GtgmCtgGtcAaaAaaTgcTtgmCttmCgtTgcTtaTtcGcaTtgmCacTcgmCa 

mCttmCgaGaamCtcTtcAagTtgGaaTcaAcaGtgGcaTcgGatAcamCatGa 

GtgmCctTctGaaGccGaaGaaAacGacGatTagTtaAatGttTccAagTt 

GatAaaAtcGatAgcGacGacGatGagGaaGccGatGatGagGagmCtcGa 

GcaGgtGgaTacGgaTgtGgaGctGacTttTgcGttTtaTcaAgaAtcTc 

TccmCgtAgaAgtAgaAatGctAgaAgaAccTgaAcaAgaAgaTcaAgaAa 

TgcAagAtgTcaGtaTtgAaamCaaTtcmCtgTagAgamCccmCcgAagAaaAt 

AgtmCtcGtaTccGggAatGttTcaGccTgtGaaAatGctTgtTgaAgamCg 

mCttmCaaAacmCgtmCgcTttTaaGgaTacAggAacGtgGcamCgcTtcmCgaGg 

mCagAttGtamCctTcgAaaAggAaaAggAgaGaaTcgmCgtmCgcAaaAatGg 

TgaTggmCttTgaTtaTtcGagmCagGagmCaaTgaTgtmCcgAgaGtcGttAt 

m CaaTgamCgaGaaTatTggAgtAatGgg GaaActGgtTgcGacTtgmCg aAa 

TtgGaaAacAatmacmCtcGacTttmagmCtcActmCttmCgtGaaActAtcmCa 

TctTgtTatTttAttTtgTttTggGctTgtTccGaaAatGaaAtgGttGt 

mCaaTggAtcAccAagmCcaGttrnCacAagmCacmCgtGagmCaaAgaGgamCtc 

Ac 

mCttTgtGatGtgAtgActGcgAagGgamCacTtgAtgGctAttAcgAgamCa 

GagmCcaGclActmCagAtgAcamCtcAacAcgTtcmCatTatGcaGgaGttTc 

TacAclmCcaTccTcgmCcgAcaTacAatmCcaAcaTctmCcamCgcGgaTtcTc 

AtgGagAagAtgGttTggAtgGaaTgtGggTtgAgaAtcAgaAtaTgcmCg 

Aac mCggG atAccGtgTcg AacG tcAcaTg a Aag AtgGcg AtaTaaTcgTc 

GagGagAttAaamCgcAtgTcaGtgGctmCatGtcGagTttmCcaGaaGtcTa 

AgaTatTgcmCtcTacTtaTcaTggGccTgaTggfnCtlTgtmCtgmCcgGtaTt 

GaaTctmCaamCcamCttmCtgGaamCccmCatAcamCcaAtgGatAgaAgamCgg 

Ag 

GttGttmCttTttTccGtgAtcTttTcaTgtTtaTgtrnCtgAacGtgGcaGg 

GacTcgTtgGtgTctTgcTagGatGtcTtgGgtTcaTtcmCtcAatmCgtTg 

GlamCtgGgcTcgAggGctGaaActAatmCgaAgaAgaAacTccAgaAgaTa 

GgaTcaTgcTctGttTacGacActGatGagTtaAgaGtcAgamCtgmCacGt 

inCgaTggTtcTtcTcgTctAtcAtaTcgGggTagTtgmCcgAagTgtTgaAa 

mCaaAtcGaamCtgGtaTaaAggAggAccGacGgaGacGaaTttGaamCgaGa 

AttmCgaTcaAagAacTclGgcTctmCggmCgtTaamCtgGacAttTgtTcgTc 

mCtcmCccGagmCagGcgAttAttmCacGctAgtTatGctmCaaAtgTgaTctGt 
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CESOD.C1 5F1 .7_u435_LNA3 

CES0D_C1 5F1 .7_u9_LNA3 

CESOD_F10D 11.1 _u326_LN A3 

CESOD_F1 0D1 1 . 1 _u477_LNA3 

CESULT_EEED8.2_u31 6JJM A3 

CESULT_EEED8.2_u82_LNA3 

CESULT_Y1 13G7A.11_u252_LNA3 

CESULT_Y1 13G7A.1 1_u96_LNA3 

CESULT_Y67A10A.4_u108_LNA3 

CESULT_Y67A10A.4_U327_LNA3 

CETOPOJ<12D12.1_u398_LNA3 

C ETOPO_K1 2D 1 2 . 1 _u449_LN A3 

CETOPO_M01 E5.5b_u256_LNA3 

CETOPO_M01 E5.5b_u429_LNA3 

CEUbLF25B5.4__u 1 86J.N A3 

CEUbi_F25B5.4_u2_LNA3 

CEUbi_F29B9.6_u145_LNA3 

CEUbi_F29B9.6_u230.LNA3 

CEUbi_M7. 1 _u239_LNA3 

CEUbLM7.1_u53_LNA3 

CEUGT_F39G3. 1 _u40_LN A3 

CEUGT_F39G3.1_u466_LNA3 

CEUGT_M88.1_u480J_NA3 

CEUGT_M88.1_u72J-NA3 

YAL009W_u 1 45_LNA3 

Y AL009W_u341 _LNA3 

YAL059W_u262_LNA3 

YAL059W_u51_LNA3 

YER109C_u109_LNA3 

YER109C_u436_LNA3 

YHR152W_u128_LNA3 

YHR152W - u510_LNA3 

YKL130C_u211_LNA3 

YKL130C_u85_LNA3 

YKL178C_u199_LNA3 

YKL178C_u367_LNA3 

YLR443W_u179_LNA3 

YLR443W_u86_LNA3 

YOR092W_u25 1 _LNA3 

YOR092W_u82_LNA3 

YPL263C_u132_LNA3 
YPL263C_u257_LNA3 
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mCcgGtamCtaTctGgaTcamCacAgaAgtrriCcgAaaAtgAccAggmCagTtaTt 

mCccAglGacTacmCtgAatmCgcGtcTclGaaTctmCcamCacAatTccTacTa 

GgaGttGctmCacmCgcAatTaaGagmCgamCttmCggAtcTctGgaTaaTctTc 

AaaTtgAggAaaAgcTtcAcgAggmCggTctmCcaAagGaaAcgTcaAagAa 

mCaaTcgTacmCatGaaAgaAgtTggAagmCcamCgtGcaAgaGaaGaaAtcniCa 

AagAagAttmCctGacmCagAgaGacTcarnCgtGctTacrnCcaAgaAgcAtcTa 

AgcAnGgtGgaAatAcgAaaTggiriCatGggAagAgaAacmCccTctmCaaTt 

mCtgGttAcgGtaGtgTatG gtmCccTgtmCd mCtcAgaAtg mCaaAtaTgl mCg 

TctAcgTcgAtgGaaAagmCcgAttTaamCaaTcaAagmCcaAcaAcgmCagTt 

GgaAagGtgmCcaAaaAgtTgarnCagmCaaTtgGagGatmCttAttmCatTgcmCa 

AgaTgaTgaTgaAgtTccTgcAaaGaaGccTgcTccAgcGaaGaaAgcTg 

AaaAccTcgTacTggAaaAggAgcTgcGaaAgcGgaAgtTatmCgaTttGt 

GagAagGccmCagAagAagTacGacAgamCtgAagGagmCagTtgAaaAagTt 

TtcTgtmCatAcaAtcGtgmCtaAtcGgcAggTtgmCgaTccTttGtaAccAt 

AagmCttmCggAcamCcaTtgAgaAtgTcaAagmCcaAaaTccAggAtaAggAg 

AatmCgaAccmCatmCaaTtcActmCgtTatTccTcxiTcgAtcTccGttmCaaGt 

mCtgAacmCatmCcaAatAttGaaGatmCcaGctrnCagGctGaaGccTatmCagAt 

mCgtGtgmCttAtcTctTctGgaTgaAaamCaaGgaTtgGaaGccGtcAatmCt 

mCggAagmCatmCtgmCctTgamCatTctmCcgTtcGcaGtgGtcGccGgcTctG 

AaaGtamCgcTatGtgAggAggmCtaAcamCcaTtcAtaTaaGaamCgcAgcmCa 

TgtTgcmCgtAgaAgaGagActAaaActAagAacGatTgaTtgAagGtcTg 

Tac AatTctTtg mC ag GaaGcaAtaTccGccG g aGtcmCccmCttAtcActAt 

mCtcAcgGagGttAtaAttmCtaTgcAggAggmCaaTnmCtgmCtgGagTtcmCa 

AccGttTcaTgaGagmCtgTaaTcaGgtGttGttTctGtaAaaAgtGtgAa 

GtgGatGtgAaaTtaGtcmCtcAacmCocAgaGcaTttAgtGcaGagAttAg 

GcaGttTaaTgtGaaGctAgtTaaAgtAcaGtcTacGtgGgamCgaGaaAt 

AnGccAagTccAttTctmCgtGccAagTacAttmCaaAatAcaAgaAagGc 

AgamCtcmCtamCaaAtaGatTcgGtgTccTgcmCagAcgAtgTtgAagAatAg 

TtgAagTttGggAatAttGgtAtgGttGaaGacmCaaGgamCcgGatTacGa 

GagG eg mCaaGtaGgcAatGatTcaAgaAgtAgtAaaG gcAalmCgtAacAc 

TgaGcamCaaAgtTaaGatGtlmCggAaaGaaAaaGaaAgtmCaaTccTatGa 

mCaaGtgAccAatmCagmCacGcamCggmCttmCcaTccTcaAgamCtgAtaTtam 

Cc 

AttAaaTgcGcaGatGagGacGgaAcgAatAtcGgaGaaActGatAatAt 

GatGgtAagmCtgAgcGccTtgGacGaaGaaTttGatGttGtcGctActAa 

TacGtcAcgm CaaG gamCag AgcTttG acGacG aaAtaTcamCttGgaGgaTt 

TctmCccTgtGtaGgiAcamCcaAtaTcamCaaGcgmCatTtcTatGtcGacTa 

TgcTaamCacmCagTttAgamCcaTggAaaTccmCacmCgcAaaTatAagmCaaTg 

GcaGgamCatAagAttmCcgGtcAagmCaamCgamCagTgaAgaAagTatGcaAa 

mCcgTctAgtGaaAgcGggAtgGclAaaTtgGgaAaa mCgamCaaGatG ItAt 

GatGctTcaAtaTccTttGatGgtmCgtTagTttAccAtlTtlGgtGtcTt 

iCatTtgAgtTatGtgAagAccGuGgtGggAaaGaaGagAtcAggTg 

GtcTtgGclAccAcamCccAaaArcGttmCgaAacTttAagAgcAttmCtamCt 
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Example 9. Performance Analysis of LNA Oligonucleotide Capture Probes Designed to Detect 
Ratios of Splice Variants in mRNA Pools. 
Oligonucleotide Design for Mmicroarrays. 

The methods for designing exon-specific internal oligonucleotide capture probes are 
5 described above. 
Design of the LNA-modified Capture Probes 

For the internal LNA-modified oligonucleotide capture probes, every third DNA 
nucleotide was substituted with an LNA nucleotide. The probes designed to capture the 
junction of the recombinant splice variants were designed with LNA modifications in a block of 
10 five consecutive LNAs nucleotides, two on the 5' side of the splice junction and three on the 3' 
side of the splice junction. All capture probes are shown in Table 14. 


Table 14. Internal, exon-specific and merged, exon-exon junction specific oligonucleotide 
capture probes used in this example. 

Capture probes Sequence (LNA=uppercase, DNA lowercase letters! 

gene78.01a_50J.NA3 mCctGaaAgtAgaTtt£ttAttTcc^ 

gene78.01b_50JJSIA3 mCalAtamtomCaaAtaGtanQ^CaaAaaTcamCaaGaaAacTcamCaamCacTg 
gene78.03a_50J-NA3 GatTtgmCagmCggTggTaaAaaGtaTgaAaamCgtGgtAatTaaAagGtcTc 
gene78.03b_50JJMA3 mCraAtgAaaActAatmCaaAggTaaAcgTggAtcmCcaTggmCaaTtcmCcgGg 
gene78.m01INS3_50J>lock caacactgcccagaggttcaatcGATmCmCgatgatcctaatgaaggcgccc 
gene78.mINS303_50_block gtxxagtatcgtccatcatAGTATcgataaatatgtgaaggaaatgcctg 
gene78.m01INS4_5CLblock caacactgcccagaggttcaateGATGT gtgataggatcagtgttcaggg 
gene78.mINS403_50_block gaaggcgaaggagactgctAATATcgataaatatgtgaaggaaatgcctg 

15 

Printing and Coupling of the Splice Isoform-specific Microarrays 

The splice variant capture probes were synthesized with a 5* anthraquinone (AQ)- 
modification, followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes were first 
diluted to a 20 \*M final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on 

20 the Immobilizer polymer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One 
(Packard Biochip Technologies, USA) with a spot volume of 2x 300 pi and 300 \im between 
the spots. The capture probes were immobilized onto the microarray slide by UV irradiation in 
a Stratalinker with 2300 pjoules (Stratagene, USA). Non-immobilized capture probe 
oligonucleotides were removed from the slides by washing the slides two times 15 minutes in 

25 lxSSC. After washing, the slides were dried by centrifugation at 1000x£ for 2 minutes, and 
stored in a slide box until microarray hybridization. 
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Construction of Splice Variant Vectors 

The recombinant splice variant constructs were cloned into the Triampl8 vector 
(Ambion, USA). The constructs were sequenced to confirm their construction. The plasmid 
5 clones were transformed into E. coli XLIO-Gold (Stratagene, USA). 
Triampl8/SW15 Vector Construct 

Genomic DNA was prepared from a wild type standard laboratory strain of 
Saccharomyces cerevisiae using the Nucleon MiY DNA extraction kit (Amersham Biosciences, 
USA) according to the supplier's instructions. Amplification of the partial yeast gene was 

10 performed using standard PCR using yeast genomic DNA as template. In the first step of 
amplification, a forward primer containing a restriction enzyme site and a reverse primer 
containing a universal linker sequence were used. In this step, 20 bp was added to the 3'-end of 
the amplicon, next to the stop codon. In the second step of amplification, the reverse primer 
was exchanged with a nested primer containing a poly-T20 tail and a restriction enzyme site. 

15 The SWTS amplicon contains 730 bp of the SWI5 ORF plus 20 bp universal linker sequence and 
a poly-A 2 Q tail. 
The PCR primers used were; 

YDR I46C-For-EcoRI: acgtgaattcaaatacagacaatgaaggagatga 
YDR 146C-Rev-Uni: gatccccgggaattgccatgttacctttgattagttttcattggc 

20 Uni-polyT-BamHI: acgtggatccttttttttttttttttttttgatccccgggaattgccatg. 

The PCR amplicon was cut with the restriction enzymes, EcoRl + BamHI. The DNA 
fragment was ligated into the pTRIampl8 vector (Ambion, USA) using the Quick Ligation Kit 
(New England Biolabs, USA) according to the supplier's instructions and transformed into E. 
coli DH-5cc by standard methods. 

25 Construction of the Recombinant Splice Variant #1 (Triampl8/swi5-rubisco) 

The Arabidopsis thaliana Rubisco small subunit ssu2b gene fragment (gi 17064721) was 
amplified from genomic DNA by primers named DJ305 5'- 
ACT ATGATGG ACG ATACTGG AC-3' and DJ306 5 

ATTGGATCGATCCGATGATCCTAATGAAGGC-3\ containing Clal restriction site linkers. 

30 The purified PCR fragment was digested with Clal and then cloned into the swi5 (gl:7839148) 
vector at the unique Clal site (atcgat) giving each insert a flanking sequence from the original 
yeast SWI5 insert (named exonOl and exon 03, Fig. 19). The product was inserted in the 
reverse orientation, so that the insert sequence is: 
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atcgatCCGATGATCCTAATGAAGGCGCCCGGGTACTCCTTCTTGCATTCTTCAACT 

CTTCAACACTTGAGCGGAGTCGGTGCATCCGAACAATGGAAGCTTCCACATTGTCC 

AGTATCGTCCATCATAGTatcgat 

5 Nucleotide sequence analysis revealed a difference between the sequence of A. thaliana rubisco 
expected from the GenBank database and that obtained from all sequenced constructs and PCR 
products. Position 30 in the Rubisco insert is M C" rather than the expected "A*\ This SNP was 
probably created by PCR. None of the oligonucleotide capture probes used in the example 
cover this region. Rubisco sequence in genbank is TCCTAATGAAGGCGCCA. The sequence 
10 obtained 

from the plasmid contruct is TCCTAATGAAGGCGCCC. 
Construction of the Recombinant Splice variant #2 (Triampl8/swi5-lea) 

The Arabidopsis thaliana Lea gene (gil526423) was amplified from genomic DNA 
with primers named DJ307 5*-GGAATTATCGATGTGTGATAGGATCAGTGTTCAG-3' and 
15 DJ308 5 ' - AATTGG ATCG AT ATTAGC AGTCTCCTTCGCC-3 ' including the Clal linker sites 
as above. The PCR fragment was digested with Clal cloned into the yeast S WI5 IVT construct 
as above at the unique Clal site. The fragment was inserted in the forward orientation, resulting 
in the following insert sequence: 

atcgatGTGTGATAGGTTCAGTGTTCAGGGCTGTCCAAGGAACGTATGAGCATGCGAG 
20 AGACGCTGTAGTTGGAAAAACCCACGAAGCGGCTGAGTCTACCAAAGAAGGAGCT 
CAGATAGCTTCAGAGAAAGCGGTTGGAGCAAAGGACGCAACCGTCGAGAAAGCTA 
AGGAAACCGCTGATTATACTGCGGAGAAGGTGGGTGAGTATAAAGACTATACGGT 
TGATAAAGCTAAAGAGGCTAAGGACACAACTGCAGAGAAGGCGAAGGAGACTGCT 
AATatcgat 

25 

Preparation of Target 

In vitro RNA Preparation from Splice Variant Vectors 

In vitro RNA from the splice variants were made using the MEGAscript™ high yield 
transcription kit according to the manufacturer's instructions (Ambion, USA). The yield of 
30 IVT RNA was quantified at a Nanodrop spectrophotometer (Nanodrop Technologies, USA). 

Isolation of Total RNA from C. elesans 

C. elegans wild-type strain (Bristol-N2) was maintained on nematode growth medium 

(NG) plates seeded with Escherichia coli strain OP50 at 20 °C, and the mixed stages of the 

35 nematode were prepared as described in Hope, I. A. (ed.) " C. elegans - A Practical Approach 
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Oxford University Press 1999. The samples were immediately flash frozen in liquid N 2 and 
stored at - 80 °C until RNA isolation. 

A 100 fJtl aliquot of packed C. elegans worms from a mixed stage population was 
homogenized using the FastPrep Bio 101 from Kem-En-Tec for 1 minute, speed 6 followed by 
5 isolation of total RNA from the extracts using the FastPrep Biol 01 kit (Kem-En-Tec) according 
to the manufacturer's instructions. The eluted total RNA was ethanol precipitated for 24 hours 
at -20°C by addition of 2.5 volumes of 96% EtOH and 0.1 volume of 3M Na-acetate, pH 5.2 
(Ambion, USA), followed by centrifugation of the total RNA sample for 30 minutes at 13200 
rpm. The total RNA pellet was air-dried and redissolved in 10 |il of diethylpyrocarbonate 

10 (DEPC)-treated water (Ambion, USA) and stored at - 80°C. 
Fluorochrome-labelling of the Target 

The following fluorochrome-labelled cDNA targets were synthesized to test the 
performance of 'merged' probes that span exon borders. Synthetic RNAs corresponding to the 
splice variant #1 (exon01~INS3-exon03 (1-INS3-3) and splice variant #2 (exon01-INS4-exon03 

15 (1-INS3-3) were spiked into lO^g of C elegans reference total RNA sample in two different 
ratios. The first target pool (KU007) contained 10 ng of splice variant #1 (1-INS3-3) transcript 
and 2 ng of variant #2 (1-INS4-3) transcript, a ratio of 5:1. The second target pool (KU008) 
contained 2 ng variant #1 (1-INS3-3) transcript and 10 ng of splice variant #2 (1-INS4-3) 
transcript, a ratio of 1:5. Both mRNA pools were combined in separate labeling reactions with 

20 5 ng anchored oligo(dT2o) primer and DEPCMreated water to a final volume of 8 Jll. The 
mixture was heated at 70°C for 10 minutes, quenched on ice for 5 minutes, followed by addition 
of 20 units of Superasin RNase inhibitor (Ambion, USA), 1 jil dNTP solution (lOmM each 
dATP, dGTP, dTTP and 0.4 mM dCTP, and 3 fil Cy5-dCTP, Amersham Biosciensces, USA), 4 
jil 5 x RTase buffer (Invitrogen), 2*il 0.1 mM DTT (Invitrogen), 400 units of Superscript II 

25 reverse transcriptase (Invitrogen, USA) and DEPC-treated water to 20 \i\ final volume. 
Background hybridization to merged capture probes was monitored in both hybridizations using 
the other fuor channel with 10|ig of C. elegans reference RNA alone labeled with Cy3-dCTP, 
according to the labeling method described above for the splice variant spikes. All four cDNA 
syntheses were carried out at 42°C for 2 hours, and the reaction was stopped by incubation at 

30 70°C for 5 minutes, followed by incubation on ice for 5 minutes. 

Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR 
columns as described below. The column was pre-spun for 1 minute at 1500 xg in a 1.5 ml 
tube, and the column was placed in a new 1.5 ml tube. The cDNA sample was slowly to the top 
center of the resin, spun 1500-xg for 2 minutes, and the eluate was collected. The RNA was 
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hydrolyzed by adding 3 u.1 of 0.5 M NaOH, mixing, and incubating at 70 °C for 15 minutes. 

The samples were neutralized by adding 3 u.1 of 0.5 M HC1 and mixing, followed by addition of 

450 u.1 lxTE, pH 7.5 to the neutralized sample and transfer onto a Microcon-30 concentrator 

(prior to use, 500 u.1 lxTE was spun through the column to remove residual glycerol). The 

5 samples were centrifuged at 14000-xg in a microcentrifuge for 12 minutes. Spinning was 

continued until volume was reduced to 5 \x\. The labelled cDNA probes were eluted by 

inverting the Microcon-30 tube and spinning at 1000-xg for 3 minutes. 

Microarray Hybridization 

The fluorochrome-labelled cDNA samples, respectively, were combined (the two 

10 different ratios separately). The following were added: 3.75 u.1 20x SSC (3x SSC final, which 
was passed through a 0.22 ^filter prior to use to remove particulates) yeast tRNA (1 ng/u.1 final) 
0.625 u.1 1 M HEPES, pH 7.0 (25 mM final, which was passed through 0.22 jifilter prior to use 
to remove particulates) 0,75 ul 10 % SDS (0.3 % final) and DEPC-water to 25 u.1 final volume. 
The labelled cDNA target samples were filtered in Millipore 0.22 \i filter spin column 

15 (Ultrafree-MC, Millipore, USA) according to the manufacturer's instructions, followed by 
incubation of the reaction mixture at 100 °C for 2-5 minutes. The cDNA probes were cooled at 
room temp for 2-5 minutes by spinning at maximum speed in a microcentrifuge. A LifterSlip 
(Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on 
Immobilizer™ MicroArray Slide, and the hybridization mixture was applied to the array from 

20 the side. An aliquot of 30 |iL of 3xSSC was added to both ends of the hybridization chamber, 
and the Immobilizer™ MicroArray Slide was placed in the hybridization chamber (DieTech, 
USA). The chamber was sealed watertight and incubated at 65°C for 16-18 hours submerged in 
a water bath. After hybridization, the slide was removed carefully from the hybridization 
chamber and washed using the following protocol. The slides were washed sequentially by 

25 plunging gently in 2 x SSC/0.1% SDS at room temperature until the cover slip falls off into the 
washing solution, then in Ix SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room 
temperature for 1 minute, then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at 
room temperature for 1 minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium 
Citrate) for 5 seconds, followed by drying of the slides by spinning at 1000 xg for 2 minutes. 

30 The slides were stored in a slide box in the dark until scanning. 
Microarray data analysis 

The splice variant microarray was scanned in a ScanArray 4000XL confocal laser 
scanner (Packard Instruments, USA). The hybridization data were analysed using the GenePix 
Pro 4.01 microarray analysis software (Axon, USA). Only the Cy5 (650 nm) data were 
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examined as both hybridizations produced comparable, and acceptably low, signal from the C 

elegans reference RNA alone (Cy3 channel). 

Normalization 

Data was normalized so that it could be compared between hybridizations. Both 
5 hybridizations contained the same amount of RNA from synthetic exons 01 and exons 03 (10+2 
ng), so signal from the capture probes designed to internal regions of these exons is expected to 
be equal. The ratio of raw Cy5 signal between the two different labeled cDNA target pools, 
designated as KU007 and KU008 hybridizations, for each probe corresponding to either of 
these exons was calculated, that is for each probe i_we calculated the ratio probe/KU007/ 

10 probe/KU008). The average of all of these ratios was used as the normalization ratio. 
Expectations of normalized data. 

To reflect the proportions of RNA spiked into the hybridization, the ratio of signal in 
hybridization KU007/KU008 should be 5 for probes designed to exon junctions of the INS3 
splice variant #1 and 0.2 for probes corresponding to 1-INS4 splice variant #2. Data was log 2 

15 transformed: log2(5)=+2.32, log2(0.2)=-2.32. The merged probe corresponding to the exon 01- 
exon 03 border desirably produces a consistently low value that is desirably independent of 
which transcript was more abundant, i.e., Iog2(ratio)=0. 
Array results 

Results are summarized in Table 15. 50-mer capture probes containing LNA in a block 
20 spanning exon-exon junctions were consistent in producing the expected ratios. 


Table 15. LNA 50-mer block probes are most consistent in producing overall data closest to 
expected ratios. 


Capture probe 

Expected ratio (log2) 

Observed ratio (log2) with merged 
LNA block capture probes 

gene78.m0103 

0.00 

-0.24 

gene78.m01 INS3 

2.32 

2.93 

gene78.m01 INS4 

-2.32 

-2.39 

gene78.mlNS303 

2.32 

3.11 

gene78.mlNS403 

-2.32 

-0.86 


25 

Example 10: Improved Signal-to- noise Ratios using LNA Oligonucleotide Capture Probes 
Combined with cDNA Target Fragmentation with the E. coli Uracil-DNA Glvcosvlase . 
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Capture Probe Design 

The capture probes were designed to a 602 nucleotide 3' -region of the Yeast (S. 

cerevisiae) 70 kDa heat shock protein (SSA4) gene. The 602-base pair sequence is shown in 

Table 16. For the LNA-spiked oligonucleotide capture probes, every third DNA nucleotide was 

5 substituted with a LNA nucleotide. All capture probes are shown in Table 17. 

Table 16. Six hundred and two (602) base sequence stretch of S.cerevisiae ssa4 gene. The 
underline indicates the position of the capture probes. First underline is equal to capture probe 
YER103W-554, second underline is equal to capture probe YER103W-492 and so forth. 


ggt <f iiaggac a*ggic aaaAjac aac aafcct act gggt iaatttgagttgagcggtattccacccgctcca.agaggcgt;accic 


aaat t gaagt t ac at 1 1 gat at c gat gc aaatggtattctgaacgtatctgccgttgaaaaaggtactggtaaatctaacaagat 


tac aatt act aac gat aagggaagatt at c gaaggaagat at c gat aaa at ggtt get gaggc agaaaagtt c aag gc c gaa gat 


gaa c aagaagct c aac gt gtt c aagct aagaat c a get agaat egtaege gttt ac ttt gaaaaat t ct gt gage gaa a at a act 


tcaaggagaaggtgggtgaagaggatgccaggaaattggaagccgccgcccaagatgctataaattggttagatgcttcgcaagc 


ggcctccaccgaggaatacaagga.aaggcaaaaggaactagaaggtgttgcaaaccccattatgagtaaattttac ggagctgea 


ggtggtgccccaggagc aggcccagttccgggt get ggagcaggc c c c act ggagcacc agac aaeggee c aacggtt gaagagg 


tt gat tag 


10 


Table 17. Capture probes for the SSA4 tile array. 


Qlipo Name 

YER 1 03W-1 -DNA 

YER103W-38-DNA 

YER103W-73-DNA 

YER103W-92-DNA 

YER103W-127-DNA 

YER103W-200-DNA 

YER103W-245-DNA 

YER103W-272-DNA 

YER103W-336-DNA 

YER103W-393-DNA 

YER103W-447-DNA 

YER103W-492-DNA 

YER103W-554-DNA 

YER103W-1-LNA1 

YER103W-38-LNA1 

YER103W-73-LNA1 


Sequence 

gccccactggagcaccagacaacggcccaacggttgaagaggttgattag 

gccccaggagcaggcccagttccgggtgctggagcaggccccactggagc 

ccattatgagtaaattttacggagctgcaggtggtgccccaggagcaggc 

ctagaaggtgttgcaaaccccattatgagtaaattttacggagctgcagg 

cctccaccgaggaatacaaggaaaggcaaaaggaactagaaggtgttgca 

ggtgaagaggatgccaggaaattggaagccgccgcccaagatgctataaa 

actttgaaaaattctgtgagcgaaaataacttcaaggagaaggtgggtga 

aagaatcagctagaatcgtacgcgtttactttgaaaaattctgtgagcga 

aatggttgctgaggcagaaaagttcaaggccgaagatgaacaagaagctc 

taacaagattacaattactaacgataagggaagattatcgaaggaagata 

cgatgcaaatggtattctgaacgtatctgccgttgaaaaaggtactggta 

acccgctccaagaggcgtaccacaaattgaagttacatttgatatcgatg 

ggtgaaaggacaaggacaaaagacaacaatctactgggtaaatttgagtt 

GccmCcamCtgGagmCacmCagAcaAcgGccmCaamCggTtgAagAggTtgAttAg 

GccmCcaGgaGcaGgcmCcaGttmCcgGgtGctGgaGcaGgcmCccActGgaGc 

mCcaTtaTgaGtaAatTttAcgGagmCtgmCagGtgGtgmCccmCagGagmCagGc 
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YER103W-92-LNA1 
YER103W-127-LNA1 
YER103W-200-LNA1 
YER103W-245-LNA1 
YER103W-272-LNA1 
YER103W-336-LNA1 
YER103W-393-LNA1 
YER103W-447-LNA1 
YER103W-492-LNA1 
YER103W-554-LNA1 
Control capture probes 
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mCtaGaaGgtGttGcaAacmCccAttAtgAgtAaaTttTacGgaGctGcaGg 

mCctmCcamCcgAggAatAcaAggAaaGgcAaaAggAacTagAagGtgTtgmCa 

GgtGaaGagGatGccAggAaaTtgGaaGccGccGccmCaaGatGctAtaAa 

ActTtgAaaAatTctGtgAgcGaaAatAacTtcAagGagAagGtgGgtGa 

AagAatmCagmCtaGaaTcgTacGcgTttActTtgAaaAatTctGtgAgcGa 

AatGgtTgcTgaGgcAgaAaaGttmCaaGgcmCgaAgaTgaAcaAgaAgcTc 

TaamCaaGatTacAatTacTaamCgaTaaGggAagAttAtcGaaGgaAgaTa 

mCgaTgcAaaTggTatTctGaamCgtAtcTgcmCgtTgaAaaAggTacTggTa 

AccmCgcTccAagAggmCgtAccAcaAatTgaAgtTacAttTgaTatmCgaTg 

GgtGaaAggAcaAggAcaAaaGacAacAatmCtamCtgGgtAaaTttGagTt 


YFL039C-50 acaagaatacgacgaaagtggtccatctatcgttcaccacaagtgtttct 

YFL039C-50_LNA3 AcaAgaAtamCgamCgaAagTggTccAtcTatmCgtTcamCcamCaaGtgTttmCt 

YDR146C-50 tgggaatggaacggggattatggtttcgccaatgaaaactaatcaaaggt 

YDR146C-50JJMA3 TggGaaTggAacGggGatTatGgtTtcGccAatGaaAacTaaTcaAagGt 


Printing and Coupling of the Yeast SSA4 Tile Microarrays 

The SSA4 capture probes were synthesized with a 5' anthraquinone (AQ)-modification, 
5 followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes (Table 17) were first 
diluted to a 20 fiM final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on 
the Immobilizer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One (Packard 
Biochip Technologies) with a spot volume of 2x 300 pi and 400 Jim between the spots. The 
capture probes were immobilized onto the microarray slide by UV irradiation in a Stratalinker 
10 with 2300 ixjoules (Stratagene, USA). Non-immobilized capture probe oligonucleotides were 
removed from the slides by washing the slides two times 15 minutes in lxSSC. After washing, 
the slides were dried by centrifugation at lOOOxg for 2 minutes, and stored in a slide box until 
microarray hybridization. 

Yeast Cultures 

15 Saccharomyces cerevisiae wild-type (BY4741, MATa; his3Al; leu2A0; met 1 SAO; 

ura3A0) and Assa4 (MATa; his3Al; leu2A0; metl5A0; ura3A0; YER103w::kanMX4) mutant 
strains (EUROSCARF) were grown in YPD at 30°C until the Aooo density of the cultures 
reached 0.8. Half of the cultures were collected by centrifugation and resuspended in one 
volume of 40°C preheated YPD. Incubation was continued for an additional 30 minutes at 

20 30°C or 40°C for the standard and heat-shocked cultures, respectively. Cells were harvested by 
centrifugation and stored at -80°C. 
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RNA Extraction 

Total RNA was extracted using the FastRNA Kit-RED (BIO 101) according to 
suppliers' instructions. The quantity and quality of the RNA preparations were examined by 
5 standard spectrophotometry on a NanoDrop ND-1000 and by gel electrophoresis. Only high 
quality RNA preparations were used for microarray analyses. 
Fluorochrome-labelling of the Target 

A total of seven cDNA assay mixtures were produced. Each with ten (10) |ig total RNA 
from wt -ssa4 wt and combined with 5 \ig anchored oligo(dT2o) primer and DEPC-treated water 

10 to a final volume of 8 |Xl. The mixtures were heated at 70°C for 10 minutes, quenched on ice 
for 5 minutes, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 3 
\xl Cy3-dCTP (Amersham Biosciences), lOmM final concentration of dATP and dGTP, 4 \i\ 5 
x RTase buffer (Invitrogen), 2^1 0.1 mM DTT (Invitrogen), 400 units of Superscript 13 reverse 
transcriptase (Invitrogen, USA), dUTP and dTTP accordingly to Table 18, and DEPC-treated 

15 water to 20 |xl final volume. A parallel set-up was made with 10 \ig total RNA from Assa4 for 
target cDNA labelling with Cy5-dCTP. All cDNA syntheses were carried out at 42°C for 2 
hours, and the reaction was stopped by incubation at 70°C for 5 minutes, followed by 
incubation on ice for 5 minutes. Each cDNA pool (except the unfragmented control pool) was 
incubated at 37°C for 2 hours with 2 units of Uracil-DNA Glycosylase (UDG, New England 

20Biolabs, USA) and by addition of 2.4 \il (lx final concentration in the reaction mixture) of 
UDG reaction buffer. The enzyme was heat-inactivated at 95°C for 10 minutes. 
Unincoiporated dNTPs were removed by gel filtration using MicroSpin S-400 HR columns as 
described in Example 9. 

25 Table 18. dUTP and dTTP ratios in cDNA target labelling. 


Assay # 

Final cone. dUTP 

Final cone. dTTP 

1 

0.5 

0 

2 

0.25 

0.25 

3 

0.125 

0.375 

4 

0.05 

0.45 

5 

0.025 

0.475 

6 

0.0125 

0.4875 

7 

0 

0.5 
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Gel electrophoresis of the cDNA target pools 

0.5 |xl of each of the seven fragmented cDNA pools were analysed on a 2% agarose-gel. 
The data show that the cDNA is fragmented linearly with respect to the concentration of dUTP 
5 used in the synthesis. Fig. 38 shows the gel electrophoresis of fragmented cDNA from the 
yeast wild-type strain. 

Comparative Hybridization of the SSA4 Tile Array with Fluorochrome-labelled wild-type and 
Assa4 cDNA Target Pools and Post-hybridization Washes 

The fluorochrome-labelled cDNA samples, respectively, were combined (the different 

10 UDG-fragmented samples separately). The following were added: 3.75 jil 20x SSC (3x SSC 
final, pass through 0.22 \jl filter prior to use to remove particulates) yeast tRNA (1 \ig/\il final) 
0.625 ill 1 M HEPES, pH 7.0 (25 mM final, pass through 0.22 \l filter prior to use to remove 
particulates) 0.75 nl 10 % SDS (0.3 % final) and DEPC-water to 25 pi final volume. The 
labelled cDNA target samples were filtered in Millipore 0.22 \i filter spin column (Ultrafree- 

15 MC, Millipore, USA) according to the manufacturer's instructions, followed by incubation of 
the reaction mixture at 100 °C for 2-5 minutes. The cDNA probes were cooled at room temp 
for 2-5 minutes by spinning at maximum speed in a microcentrifuge. A LifterSIip (Erie 
Scientific Company, USA) was carefully placed on top of the SSA4 microarrays spotted on 
Immobilizer™ MicroArray Slide, and the hybridization mixture was applied to the array from 

20 the side. An aliquot of 30 jxL of 3xSSC was added to both ends of the hybridization chamber, 
and the slide was placed in the hybridization chamber (DieTech, USA). The chamber was 
sealed watertight and incubated at 65°C for 16-18 hours submerged in a water bath. After 
hybridization, the slide was removed carefully from the hybridization chamber and washed 
using the following protocol. The slides were washed sequentially by plunging gently in 2 x 

25 SSC/0.1% SDS at room temperature until the cover slip falls of into the washing solution, then 
in lx SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room temperature for 1 minute, 
then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for 1 
minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, 
followed by drying of the slides by spinning at 1000 xg for 2 minutes. The slides were stored in 

30 a slide box in the dark until scanning. 
Microarray Data Analysis 

The slides were scanned in a ScanArray 4000XL confocal laser scanner (Packard 
Instruments, USA). The hybridization data were analysed using the GenePix Pro 4.01 
microarray analysis software (Axon, USA). 
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In the data analysis, the differences in labelling efficiency between the two fluorescent 
dyes were scaled by using an internal normalization approach. The average signal intensities 
from the control capture probes (Table 17) were used to calculate the normalization factor. 
This factor was multiplied to the signal intensity values from the Cy-3 target. Analysis of the 
5 data demonstrates that capture probes with LNA in every third position have up to 5.2 fold 
higher signal-to-noise ratios, compared to the DNA capture probes (Fig. 39). 

Example 1 1: Interpretation of Splice Array Data Using LNA Discriminating Probes. 

This example illustrates the interpretation of microarray analysis of alternative mRNA 
10 splicing. Different LNA capture probe design types are formalized, and the expression constant 
0 is introduced as a measurement of alternative splicing. 
Introduction 

The eukaryotic pre-mRNA is the subject of Splicing and Alternative Splicing, hence 
sequences refer to RNA sequences, Original sequence refers to pre-mRNA, and splice forms 

15 refer to mRNA sequences. The splicing is conducted by a cellular machinery named the 
spliceosome. The terms exons and introns can be used to refer to regions of pre-RNA 
sequences (or more specifically a single splice form). It is noted that a part of the 
corresponding DNA/pre-mRNA sequence that is an exon (not excised) in one splice form can 
potentially be absent in another splice form (e.g., partly absent in exon truncation and 

20 completely absent in exon skipping). Thus, the terms "constant regions'* and "variable regions" 
(see below) are useful for characterizing the process of identifying different splice forms. 

Splicing can be defined as the production of a new sequence via the excision of part(s) 
of an original sequence (Fig. 40). Alternative splicing can be defined as the production of more 
than one novel sequence via the excision of different parts of the original sequence. When 

25 comparing two different splice forms, they can be divided into a constant region that is shared 
by both sequences and a variable region by which the two splice forms differ (Fig. 41). 

Alternative splicing can be categorized in terms of (i) whether or not the variable region 
is flanked by a single constant region or surrounded by two constant regions, (ii) the size of the 
variable region (e.g., exon skipping/intron retention vs. extension and truncation) [(intron/exon) 

30 5' and 3'], and (iii) the number of variable regions (and hence the number of splice forms). 
Capture Probe Design 

Capture Probe design can be divided into 3 distinct types according to their position: 
Merged Probes (MP) or Junction Probes, Unique Internal Probes (UIP), and Shared Internal 
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Probes (SIP) (Fig. 42). Considering the case of a single variable region surrounded by constant 
regions, there are several different possible capture probe positions for each type (Fig. 43). 
Data Interpretation 

The aim of the analyses can be to determine (i) whether a given original sequence is 
5 subject to alternative splicing (i.e., whether there is more than one splice form present), and (ii) 
whether there is a difference in alternative splicing of the original sequence between two 
biological samples (i.e., whether the proportions between the two splice forms differ between 
biological samples). The analysis can also be used for data validation. 

Possible biases in the microarray platform include (a) noise in terms of non-specific 
10 binding and subsequent false signal, (b) differences in dye labeling efficiency, (c) differences in 
capture probe affinity, (d) differences in sample conditions (e.g., number of cells, and amount 
of RNA), and (e) differences in reverse transcriptase efficiency of different splice forms. 
Biases can be corrected for by various means of normalization and/or standardization. 
Data Analysis 

15 In order to analyze the expression of the different splice forms, the expression constant 

0 is introduced. 0 denotes the relation between the proportions of the signals (capture probes a 
and b) between the labeled extracts from biological samples (labeled with Cy5 & Cy3). That is, 

(Cy5a / Cy3a) = (Cy5b / Cy3b) * 0 or, 

20 

0 = (Cy5a / Cy3a) / (Cy5b / Cy3b) 
= (Cy5a * Cy3b) / (Cy5b * Cy3a) and, 

0 = (Cy5a / Cy5b) / (Cy3a / Cy3b) 
25 = (Cy5a * Cy3b) / (Cy3a * Cy5b) 

= (Cy5a * Cy3b) / (Cy5b * Cy3a) [same as above] 

Considering normalization due to different biases and given a sample normalization 
factor S due to differences between the samples in terms of amounts of RNA, RT-efficiency, 
30 dye properties, etc. and a probe normalization factor P due to differences in probes in terms of 
affinity, position in target sequence, etc., the following equations apply. 

For two probes: a and b, a * P = b 

For two samples Cy5 & Cy3, Cy5 * S = Cy3, 

35 
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Thus, considering two probes from two samples the signals are: 

- Cy5a * P * S 

- Cy5b * S 

- Cy3a * P 
5 - Cy3b 

With respect to 0: 

0 = [(Cy5a * P * S) / (Cy3a * P)]/[ (Cy5b * S)/( Cy3b)] 
10 = (Cy5a * P * S * Cy3b) / (Cy3a * P * Cy5b * S) 
= (Cy5a * Cy3b * P * S) / (Cy3a * Cy5b * P * S) 
= (Cy5a * Cy3b) / (Cy3a * Cy5b) * (P * S) / (P * S) 
= (Cy5a * Cy3b) / (Cy3a * Cy5b) * 1 

= (Cy5a * Cy3b) / (Cy3a * Cy5b) [same as without normalization] 

15 

Note that the calculation of 0 is not affected by the normalization factors S and P, hence it is 
not necessary to normalize the array data when interpreting alternative splice arrays with the 
use of the Expression constant 0. 
Properties of0 

20 If 0 = 1, there is no difference in the proportions of the targets of capture probes a and b 

in the two samples. Even in the case of alternative splicing, it is not possible to determine 
whether there is more than a single splice form present using this particular method. If 0 * 1, 
there is a difference in the proportions of the targets of capture probes a and b in the two 
samples, thus there is a difference in splice pattern and therefore there must be more than one 

25 splice form present. 
Comparing 0's 

0 can be compared between different transcripts to determine whether they have 
correlated expression, and 0*s from sets of capture probes from the same transcript (different 
probes) can be averaged. 
30 Example 

Considering a simple example of a single large variable region surrounded by constant 
regions using a combination of a Merged Probe and a Shared Internal Probe. Calculating 0 of a 
single splice form can be performed using the following equation: 

35 0 = (CySMP * Cy3SIP) / (Cy3MP * Cy5SIP) 
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If 0 = 1 , there is no difference in the proportions of the targets of capture probes a and b in the 
two samples, and it may not be possible to determine whether multiple splice forms are present 
using this particular method. If 0 * 1, there is a difference in the proportions of the targets of 
5 capture probes a and b, thus there is a difference in splice pattern and therefore there must be 
more than one splice form present. 
Conclusions 

It is possible to infer difference in expression level of two capture probe targets from 
two tissues when one is comparing the proportions of signals from one capture probe with the 
10 proportion of signals from the other probe. In contrast, single signals may be subject to biases 
from normalizations and standardizations for each probe and sample. 

Example 12: Exemplary Microarrays 

The nucleic acid arrays of the invention can be generated by standard methods for either 

15 synthesis of nucleic acid probes that are then bonded to a solid support or synthesis of the 
nucleic acid probes on a solid support (e.g., by sequential addition of nucleotides to a reactive 
group on the solid support). In desirable methods for on-chip synthesis of the capture probes, 
photogenerated acids are produced in light-irradiate sites of the chip and used to deprotect the 
5'-OH group of nucleic acid monomers and oligomers (e.g., to remove an acid-labile protecting 

20 group such as 5-0-DMT) to which a nucleotide is to be added (Gao et al„ Nucleic Acid 
Research 29:4744-4750, 2001). Standard methods can also be used to label the nucleic acids in 
a test sample with, e.g., a fluorescent label, incubate the labeled nucleic acid sample with the 
array, and remove any unbound or weakly bound test nucleic acids from the array. Exemplary 
methods are described, for example, in U.S.P.N. 6,410,229; 6,406,844; 6,403,957; 6,403,320; 

25 6,403,317; 6,346,413; 6,344,316; 6,329,143; 6,310,189; 6,309,831; 6,309,823; 6,261,776; 
6,239,273; 6,238,862; 6,156,501; 5,945,334; 5,919,523; 5,889,165; 5,885,837; 5,744,305; 
5,445,934; 5,800,9927; and 5,874,219. 

In an exemplary method for synthesis of an array, capture probes were 
immobilized using AQ technology with a HEG5 linker (U.S.P.N. 6,033,784) onto an 

30 Immobilizer™ slide. An exemplary chip consists of 288 spots in four replicates (i.e., 
1 152 spots) with a pitch of 250 pirn, and an exemplary hybridization buffer is SxSSCT 
(i.e., 750 mM NaCl, 75 mM Sodium Citrate, pH 7.2, 0.05% Tween) and 10 mM MgCl 2 . 
An exemplary target is a 45-mer oligonucleotide with Cy5 at the 5' end and with a final 
concentration in the hybridization solution of 1 \iM. 
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Hybridization was performed with 200 nL hybridization solution in a 
hybridization chamber created by attaching a CoverWell™ gasket to the Immobilizer 
slide. The incubation was conducted overnight at 4°C. After hybridization, the 
hybridization solution was removed, and the chamber was flushed with 3 x 1.0 mL 
5 hybridization buffer described above without any target nucleic acid. A coverWell™ 
chamber was then filled with 200 jiL hybridization solution without target. The slide 
was observed with a Zeiss Axioplan 2 epi fluorescence microscope with a 5x Fluar 
objective and a Cy5 filterset from OMEGA. The temperature of the microscope stage 
was controlled with a Peltier element. Thirty-five images at each temperature were 

10 acquired automatically with a Photometries camera, automated shutter, and motorized * 
microscope stage. The images were acquired, stitched together, calibrated and stored in 
stack by the software package "MetaVue" 

Arrays can be generated using capture probes of any desired length (e.g., arrays of 
pentamers, hexamers, or heptamers.) In various embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or more 

1 5 nucleotides of the probes are LNA nucleotides. Desirably, at least 1 , 2, 3, 5, 7, 9, or all of the A 
and T nucleotides in the probes are LNA A and LNA T nucleotides. LNA nucleotides can be 
placed in any position of the capture probe, such as at the 5' terminus, between the 5* and 3' 
termini, or at the 3' terminus. LNA nucleotides may be consecutive or may be separated by one 
or more other nucleotides. The microarrays can be used to analyze target nucleic acids of any 

20 "AT' or "GC content, and are especially useful for analyzing nucleic acids with high "AT" 
content because of the increased affinity of the microarrays of the present invention for such 
nucleic acids compared to traditional microarrays. Desirably, the array has at least 100, 200, 
300, 400, 500, 600, 800, 1000, 2000, 5000, 8000, 10000, 15000, 20000, or more different 
probes. If desired, nucleotides with a universal base can be included in the capture probes to 

25 increase the T m of the capture probes (e.g., capture probes of less than 7, 6, 5, or 4 nucleotides). 
Exemplary "non-discriminatory" nucleotides include inosine, random nucleotides, 5 nitro- 
indole, LNA, inosine, and LNA 2-aminopurine. In desirable embodiments, 1, 2, 3, 4, 5, or 
more nucleotides with a universal base are located at the 5' and/or 3* termini of the capture 
probes. 

30 

Example 13: Exemplary Application of Nucleic Acids of the Invention 

An exemplary application of these methods includes comparing hybridization 
patterns of cDNA or cRNA from a patient sample to classify early-tumors or detect an 
infection or a diseased state. The microarrays of the invention may also be used as a 
35 general tool to analyze the PCR products generated by amplification of a test sample 
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with PCR primers for one or more nucleic acids of interest. For example, PCR primers 
can be used to amplify nucleic acids with a particular exon or exon-exon combination, 
and then the PCR products can be identified and/or quantified using a microarray of the 
invention. For identification of splice variants, PCR primers to specific exons can be 
5 used to amplify nucleic acids that are then applied to a microarray for detection and/or 
quantification as described herein. To detect microbial pathogens, species-specific PCR 
primers (e.g., primers specific for an exon whose sequence differs among species) can 
be used to amplify nucleic acids in a sample for subsequent analysis using a microarray. 
For example, the hybridization pattern of the PCR products to the array can be used to 

10 distinguish between different bacteria, viruses, or yeast and even between different 
strains of the same pathogenic species. In particular embodiments, the array is used to 
determine whether a patient sample contains a bacteria strain that is known to be 
resistant or susceptible to particular antibiotics or contains a virus or yeast strain known 
to be resistant or susceptible to certain drugs. Changes in product composition or raw 

15 material origin can also be detected using a microarray. The arrays can also be used to 
determine the composition of mRNA cocktails. 

Exemplary environmental microbiology applications of these arrays include 
identification of major rRNA types in contaminated soil samples and classification of 
microbial isolates. These rRNA amplificates are formed from rRNA by rtPCR or from 

20 the rDNA gene by conventional PCR. Numerous general and selective primers for 
different groups of organisms have been published. Most frequently an almost full 
length amplificate of the 16S rDNA gene is used (e.g., the primers 26F and 1492R). For 
purifying rRNA from a soil sample, standard methods such as one or more commercial 
extraction kits from companies such as QIAGEN ("Rneasy", Q-biogene "RNA PLUS," 

25 or "Total RNA safe" can be used. 

Example 14: Methods for Minimizing the Variance in Melting Temperatures in Nucleic Acid 
Populations of the Invention 

Any simultaneous use of more than one primer or probe is made difficult because the 

30 involved primers or probes must work under the same conditions. An indication of whether or 
not two or more primers or probes will work under the same conditions is the relative T ms at 
which the hybridized oligonucleotides dissociate. In cases where probes are applied for specific 
detection of homologous sequences such as splice variants, the AT m is of importance. AT^ 
expresses the difference between T m of the match and the T m of the mismatch hybridizations. 

35 Generally, the larger AT m obtained, the more specific detection of the sequence of interest. In 
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addition, a large AT m facilitates more probes to be used simultaneously and in this way a higher 
degree of multiplexity can be applied. 

High affinity nucleotide analogs such a LNA can be also be used universally to equalize 
the melting properties of oligonucleotides with different AT and CG content. The increased 
5 affinity of LNA adenosine and LNA thymidine corresponds approximately to the normal 
affinity of DN A guanine and DNA cytosine. An overall substitution of all DN A- A and DNA-T 
with LNA-A and LNA-T results in melting properties that are nearly sequence independent but 
only depend on the length of the oligonucleotide. This may be important for design of 
oligonucleotide probes used in large multiplex analysis. The effect of LNA A and T 

10 substitutions has been evaluated by predicting the Tm value of all possible 9-mer 
oligonucleotides with different universal substitutions. The distribution of the 262,000 T m - 
values exhibits a very homogeneous T m value for universally LNA A and T substituted 
oligonucleotides. The standard deviation of the melting temperature for all 9-mers drops from 
7.7°C for pure DNA to only 2.2°C for LNA A and T substituted oligonucleotides. This 

15 equalizing effect may also be utilized for photomediated on-chip synthesis of oligonucleotides. 

It is often difficult to design probes and primers with the same range of melting 
temperature due to the variance in ATT and G/C content of the probing sites. Highly A/T rich 
regions typically give lower T m values. Furthermore, if single mismatches are to be resolved, 
G/T mismatches are known to contribute little to AT m . As discussed above, the use of LNA is a 

20 desirable way to solve problems related to multiplex use of primers and probes. LNA offers the 
possibility to adjust T m and increase the AT m at the same time. LNA increases T m with 4- 
8°C/substitution and increases AT m in many cases (Table 9). 
Table 9. Demonstration of LNA controlled increase of T m and A T m . 


Tm of LNArDNA 
Duplexes 

Perfect 
match 
3'-ACGACCAC-5' 

Single 
mismatch 
^-ACGGCCAC-S' 

m 

LNA 8-mer 
5'-TGCTGGTG-3' 

71 °C 

45°C 

26°C 

DNA 8-mer 

5' -TGCTG GTG -3' 

35°C 

25°C 

10 0 C 


As LNA can be mixed with DNA during standard oligonucleotide synthesis, LNA can 
be placed at optimal positions in probes in order to adjust T m . The specificity of PCR may also 
be enhanced by the use of LNA in primers, or probes, and this facilitates a higher degree of 
multiplexity. By incorporation of LNA, the T m of the primers or probes can be adjusted to 
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work at the same temperature. Amplification or hybridization is more specific when LNA is 
included in primers or probes. This is due to the LNA increased AT™, which relates to higher 
specificity. Once AT m of the primers or probes is high, more primers or probes can potentially 
be brought to work together. 
5 Prediction ofT m 

LNA can be used to enhance any experiment that is based on hybridization. The series 
of algorithms described herein have been developed to predict the optimal use of LNA. 
Melting properties of 129 different LNA substituted capture probes hybridized described herein 
to their corresponding DNA targets were measured in solution using UV-spectrophotometry. 

10 The data set was divided into a training set with 90 oligonucleotides and a test set with 39 
oligonucleotides. The training set was used for training of both linear regression models and 
neural networks. Neural networks trained with nearest neighbour information, length, and 
DNA/LNA neighbour effect are efficient for prediction of T m with the given set of data. 
Applications of the Normalization of Thermal Stability by LNA A and T Nucleotide 

15 Substitutions 

All assays in which DNA/RNA hybridization is conducted may benefit from the use of 
LNA in terms of increased specificity and quality. Exemplary uses include sequencing, primer 
extension assays, PCR amplification, such as multiplex PCR, allele specific PR amplification, 
molecular beacons, (e.g., nucleic acids be multiplexed with one colour based on multiple T m 's), 
20Taq-man probes, in situ hybridization probes (e.g., chromosomal and bacterial 16S rRNA 
probes), capture probes to the mRNA poly-A tail, capture probes for microarray detection of 
SNPs, capture probes for expression microarrays (sensitivity increased 5-8 times), and capture 
probes for assessment of alternative mRNA splicing. 

25 Example 15: Exemplary Methods for the Prediction of Melting Temperatures for Nucleic Acid 
Populations of the Invention 

LNA units have different melting properties than DNA and RNA nucleotides. Until 
recently, thermodynamical models for melting temperature prediction have existed for DNA 
and RNA only, but not for LNA. Now a T m prediction model for LNA/DNA mixed 

30 oligonucleotides has been developed. The T m prediction tool is available on-line at the Exiqon 
website rwww.LNA-Tm.com and http://www.exiqon.com/Poster/Tmpred-ET-view.pdf) . 

Numerous applications in molecular biology are based on the ability of DNA and RNA 
to hybridize in a temperature dependent manner (e.g. the microarray techniques, PCR reactions 
and blotting techniques). The melting properties of nucleic acid duplexes, in particular the 

35 melting temperature T m , are crucial for optimal design of such experiments. T m is usually 
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computed using a two-state thermodynamical model (Breslauer, Meth. Enzymol., 259:221-242, 
1995). Several different groups have estimated model parameters for nearest neighbors in the 
sequence based on experimental data (for a review see SantaLucia, Proc. Natl. Acad. Sci., 
95:1460-1465, 1998). 

5 The model described herein predicts the T m of duplexes of mixed LNA/DNA 

oligonucleotides hybridized to their complementary DNA strands. DNA monomers are denoted 
with lowercase letters, and LNA monomers are denoted with uppercase letters, e.g., there are 
eight types of monomers in the mixed strand: a, c, g, t, A, C, G and T. The model is based on 
the formula (SantaLucia, 1998, supra\ Allawi et ai, Biochemistry 36:10581-10594, 1997). 

10 

T = AW 

m AS + R • in(C - C m 1 2)+ 0.368(L - l)ln[Na + ]' 

in which the salt concentration [Na+] enters as an entropic correction together with the 

oligonucleotide concentrations. R is the gas constant, C and C m are the concentrations of the 
15 two strands where C>C m , and L is the length of the strands. For self-complementary 

sequences, C-C m /2 is replaced by the total strand concentration C T and a symmetry 

correction of -1.4 cal/k mol is added to AS (SantaLucia, 1998, supra). 

The LNA model differs from SantaLucia' s DNA model in the way the changes in 

enthalpy AH and entropy AS are calculated. As in SantaLucia' s model, they depend on nearest 
20 neighbor sequence information and special contributions for the terminal base-pairs in the two 

ends of the duplex. However, with eight types of monomers (LNA and DNA) the increased 

number of nearest neighbor combinations requires more model parameters to be determined and 

hence more data. 

Parameter Reduction 

25 Usually AH and AS are calculated as a sum of contributions from all nearest neighbor 

pairs in the sequence. The inclusion of LNA doubles the number of monomer types and 
quadruples the number of possible nearest neighbor pairs. Parameter, reduction strategies are 
used for matching the model complexity to limited data sets. A strategy for reducing model 
complexity is to sum AH from single base-pair contributions, which do not take the influence of 
30 adjacent nucleotides into account. However, nearest neighbor contributions are added as a 
correction term to the single base-pair contributions. 

Another strategy is to use hierarchically reduced monomer alphabets. Here, similar 
• monomers are identified with the same letter. A four-letter alphabet, {w,s,W,S}, defines classes 
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according to binding strength: w={a,t}, s={c,g}, W={A,T} and S={C,G). The smallest 
alphabet, (D,L), simply identifies the monomer type: DNA or LNA. As an example, the 
sequence GcTAAcTt can be written as SsWWWsWw or as LDLLLDLD. 

The principle is to split AH and AS into contributions that depend on different levels of 
5 detail of the sequence. The fine levels of detail require many parameters to be determined, 
while the coarse levels need fewer parameters. The more detailed contributions can then be 
treated as minor corrections, thus effectively reducing the total number of model parameters. 
Training 

Model parameters were determined using data from melting experiments on hundreds of 
10 oligonucleotides. The oligonucleotides were random sequences with lengths between 8 and 20 
and a percentage of LNA between 20 and 70. Melting curves were obtained using a Perkin- 
Elmer UV X-40 spectrophotometer, but only the T m values were used for modeling. Model 
parameters were adjusted using a gradient descent algorithm that minimizes the error function 

data ™ 
set 

15 i.e., the distance between predicted and experimental T m values. Many different models were 
trained in this way and their performance was evaluated on test sets distinct from the training 
data. Seven reliable models were chosen and combined to form the committee model 
implemented at the Exiqon website (www.LNA-Tm.com.) 
Machine Learning And Thermodynamics 

20 The aim of this work has been to estimate T m values as accurately as possible. To this 

end, a machine learning approach has been adopted in which the prediction of the physical AH 
and AS quantities is less important. The parameters of this model may be inaccurate as 
thermodynamic quantities. First, the gradient descent algorithm produces a broad ensemble of 
models in which the AH and AS parameters can vary substantially, while maintaining an 

25 accuracy in the predicted T m . Second, the thermodynamic meaning of AH and AS is based on a 
two-state assumption, which may not be realistic in every case. Even short oligonucleotides can 
form different secondary structures or melt through multiple-state transitions (T0stesen et al, J. 
Phys. Chem. B. 105:1618-1630, 2001). Third, the use of an optical instrument instead of a 
calorimetric instrument (DSC) introduces an error in the measured AH and AS. Nevertheless, 

30 the uncertain thermodynamic interpretation of the AH and AS model parameters does not imply 
that the T m prediction model is unreliable. 
Results 
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The T m prediction model has been tested on two data sets that were not used during the 
training process. One set consisted of pure DNA oligonucleotides without LNA monomers and 
had a standard deviation of the residuals (SEP) of 1.57 degrees. The other set consisted of 
mixed oligonucleotides with both LNA and DNA and had a SEP of 5.25 degrees. The 
5 difference in prediction accuracy between the two types of oligonucleotides suggests that T m 
prediction of mixed strands is a more complex task than Tm prediction of pure DNA. This is 
possibly due to irregularities in the duplex helical structure induced by the LNA monomers 
(Nielsen et al y Bioconjug. Chem. 11:228-238, 2000). The obtained prediction accuracy is in 
both cases adequate for most biological applications. In conclusion, the reduced nearest 
10 neighbor model implemented at the Exiqon website (www.LNA-Tm.com) can predict T m 
surprisingly well for both types of oligonucleotides. This indicates that the parameter reduction 
strategy is applicable for other types of modified oligonucleotides. 

Example 16: Optional Algorithm to Optimize the Substitution Pattern of Nucleic Acids of the 
15 Invention 

High affinity nucleotides such as LNA and other nucleotides that are conformational^ 
restricted to prefer the C3'-endo conformation or nucleotides with a modified backbone and/or 
nucleobase stabilize a double helix configuration. As these effects are generally additive, the 
most stable duplex between a high affinity capture oligonucleotide and an unmodified target 

20 oligonucleotide should generally arise when all nucleotides in the capture probe or primer are 
replaced by their high affinity analogue. The most stable duplex should thus be formed 
between a fully modified LNA capture probe and the corresponding DNA/RNA target 
molecule. Such a fully modified capture probe should be more efficient in capturing target 
molecules, and the resulting duplex is more thermally stable. 

25 However, many high affinity nucleotides (e.g., as LNA) have an even higher affinity for 

other high affinity nucleotides (e.g., as LNA) than for DNA/RNA. A fully modified capture 
probe may thus form duplexes with itself, or if it is long enough, internal hairpins that are even 
more stable than duplexes with the desired target molecule. Probes with even a small inverse 
repeat segment where all constituent positions are substituted with high affinity nucleotides 

30 may bind to itself and be unable to bind the target. Thus, a sequence dependent substitution 
pattern is desirably used to avoid substitutions in positions that may form self-complementary 
base-pairs. 

For example, a computer algorithm can be used to automatically determine the optimal 
substitution pattern for any given capture probe sequence according to the following two 
35 criteria. First, the difference between the stability of (i) the duplex formed between the capture 
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probe and the target molecule and (ii) the best possible duplex between two capture probes 
should be above a certain threshold. If this is not possible, then the substitution pattern with the 
largest possible difference is chosen. Second, the capture probe should contain as many 
substitutions as possible in order to bind as much target as possible at any given temperature 
5 and to increase the thermal stability of the formed duplex. Alternatively, the second criterion is 
substituted with the following alternative criterion to obtain capture probes with similar thermal 
stability. The number and position of capture probe substitutions should be adjusted so that all 
the duplexes between capture probes and targets have a similar thermal stability (i.e., T ra 
equalization). 

10 For oligonucleotide capture probes such, incomplete matches between target and 

capture probe are likely to be a reproducible feature of the recorded biosignatures. For short 
probes, the second criterion for increasing thermal stability is more desirable that the alternative 
second criterion for T m equalization. For long capture probes and PCR primers, the second 
alternative criterion is desirably used since Tm equalization is desirable for these probes and 

15 primers. 

An exemplary algorithm works as follows. For each nucleotide sequence in an array of 
length n, all possible substitution patterns, i.e., 2 n different sequences are evaluated. Each 
evaluation consist of estimating the energetic stability of the duplex between the substituted 
capture sequence and a perfect match unmodified target ("target duplex") and the energetic 
20 stability of the most stable duplex that can be formed between two substituted capture probes 
themselves ("self duplex"). 

The energetic stability estimate for a duplex may be calculated, e.g., using a Smith- 
Waterman algorithm with the following scoring matrix. 
Gap initiation penalty: -8 
25 Gap continuation penalty: -50 

acgtACGT 
a -2 
c -2 -2 
g -2 3 -2 
30 t 2-2 1-2 

A -3 -3 -3 4 -3 

C -3 -3 6 -3 -3 -3 

G -3 6-3 2-3 9-3 

T 4-32-36-33-3 

35 
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This scoring matrix was partly based on the best parameter fit to a large (over 1000) number of 
melting curves of different DNA and LNA containing duplexes and partly by visual scoring of 
test capture probe efficiency. If desired, this scoring matrix may be optimized by optimizing 
the parameter fit as well as increasing or optimizing the dataset used to obtain these parameters. 
5 As an example of these calculations, the heptamer sequence ATGCAGA in which each 

position can be either an LNA or a DNA nucleotide is used. The target duplex formed between 
a fully modified capture probes with this sequence and its unmodified target receive a score of 
34 as illustrated below. 


10 Capture sequence: A-T-G-C-A-G-A 

I l i I i l I 

Target sequence: t-a-c-g-t-c- t 
Score : 4+4+6+6+4+6+4 = 34 

1 5 The most stable self duplex that can be formed between two modified capture probes has an 
almost equivalent energetic stability with a score of 30 as illustrated below. 

Capture sequence: A-T-G-C-A-G-A 

l l I l 

20 Target sequence: A-G-A-C-G-T-A 

Score: +6+9+9+6 = 30 


Thus, the capture probe efficiency of a fully modified probe is likely reduced by its propensity 
to form a stable duplex with itself. In contrast, by choosing a slightly different substitution 
25 pattern, ATGcaGA in which capital letters represent LNA nucleotides, the stability of the target 
duplex is reduced slightly from 34 to 29. 


Capture sequence: A-T-G-c-a-G-A 

l l I I I i I 

30 Target sequence: t -a-c-g- t-c-t 

Score: 4+4+6+3+2+6+4 = 29 


However, the most stable self complementary duplex that can be formed is reduced much more 
from 30 to 20, as illustrated below. 

35 
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Capture sequence: A-T-G-c-a-G-A 

I I I I 

Target sequence: A-G-a-c-G-T-A 
Score: +4+6+6+4 = 20 

5 

The difference between the stability of the desired target duplex and the 
undesired self duplex can be further increased by using the capture sequence AtgcaGA where 
the target duplex has a score of 24. 

10 Capture sequence: A-t-g-c-a-G-A 

i i i i i i i 

Target sequence: t -a-c-g-t-c - t 
Score: 4+2+3+3+2+6+4 = 24 

15 Whereas the score of the self duplex is only 10, as shown below. 

Capture sequence: A-t-g-c-a-G-A 

l I l l 

Target sequence: A-G-a-c-g- t-A 
20 Score: +2+3+3+2 = 10 


The additional destabilization of the self duplex is generally not required if the difference in 
stability between the target duplex and self duplex is above a threshold of 25% of the target 
duplex stability, as illustrated below. 

25 

Discrimination for ATGCAGA = (34-30) /34 = 12% < threshold 
(25%) 

Discrimination for ATGcaGA = (29-20) /29 = 31% 2 threshold 
(25%) 

30 Discrimination for ATGCAGA = (24-10) /24 = 58% a threshold 

(25%) 


Thus, ATCcaGA is the substitution pattern with the highest degree of substitution for which the 
stability of the target duplex is adequately more stable than the stability of the best self duplex 
35 (e.g., above 25%). 
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This algorithm can be used to determine desirable substitution patterns for any size 
capture probe or any given probe sequence. The following simple design rules may also be 
applied for probe design, especially for short probes. The best self alignment for the 
corresponding DNA capture probe in the sequence is determined using a simple Smith- 
5 Waterman scoring matrix of: 


a c g t 
a -2 

c -2 -2 

10 g -2 3 -2 
t 2-2 1-2 


Additionally, all possible positions in the sequence are substituted, with the exception of 
desirably avoiding the substitution of both bases of a self-complementary base-pair. The most 
15 stable self duplex thus does not contain any LNA:LNA base-pairs but only LNA:DNA 
basepairs. 

Exemplary Computer 

Any of the methods described herein may be implemented using virtually any computer. 
Fig, 44 shows such an exemplary computer system. Computer system 2 includes internal and 

20 external components. The internal components include a processor 4 coupled to a memory 6. 
The external components include a mass-storage device 8, e.g., a hard disk drive, user input 
devices 10, e.g., a keyboard and a mouse, a display 12, e.g., a monitor, and usually, a network 
link 14 capable of connecting the computer system to other computers to allow sharing of data 
and processing tasks. Programs are loaded into the memory 6 of this system 2 during operation. 

25 These programs include an operating system 16, e.g., Microsoft Windows, which manages the 
computer system, software 18 that encodes common languages and functions to assist programs 
that implement the methods of this invention, and software 20 that encodes the methods of the 
invention in a procedural language or symbolic package. Languages that can be used to 
program the methods include, without limitation. Visual C/C"" from Microsoft. In preferred 

30 applications, the methods of the invention are programmed in mathematical software packages 
that allow symbolic entry of equations and high-level specification of processing, including 
algorithms used in the execution of the programs, thereby freeing a user of the need to program 
procedurally individual equations or algorithms. An exemplary mathematical software package 
useful for this purpose is Matlab from Mathworks (Natick, MA). Using the Matlab software, 

35 one can also apply the Parallel Virtual Machine (PVM) module and Message Passing Interface 
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(MPI), which supports processing on multiple processors. This implementation of PVM and 
MPI with the methods herein is accomplished using methods known in the art. Alternatively, 
the software or a portion thereof is encoded in dedicated circuitry by methods known in the art. 

5 Example 17.: Exemplary Locked Nucleic Acids (LNA) 

As disclosed in WO 99/14226, LNA are DNA analogues that form DNA- or RNA- 
heteroduplexes with exceptionally high thermal stability. LNA units include bicyclic 
compounds as shown immediately below where ENA refers to 2'0,4'C-ethylene-bridged 



\ 

10 ENA 

References herein to Locked Nucleoside Analogues, LNA units, LNA monomers, or 
similar terms are inclusive of such compounds as disclosed in WO 99/14226, WO 00/56746, 
WO 00/56748, and WO 00/66604. 
15 Desirable LNA monomers and oligomers share some chemical properties of DNA and 

RNA; they are water soluble, can be separated by agarose gel electrophoresis, and can be 
ethanol precipitated. 

Desirable LNA monomers and oligonucleotide units include nucleoside units having a 
2*-4' cyclic linkage, as described in the International Patent Application WO 99/14226 and WO 
20 0056746, WO 0056748, and WO 0066604. Desirable LNA monomers structures are 
exemplified in the formulae la and lb below. In formula la the configuration of the furanose is 
denoted D - (J, and in formula lb the configuration is denoted L - a. Configurations which are 
composed of mixtures of the two, e.g. D - P and L - a, are also included. 
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P 



1 


la 


lb 


In la and lb, X is oxygen, sulfur and carbon; B is a universal or modified base 
(particularly non-natural occurring base) e.g. pyrene and pyridyloxazole derivatives, pyrenyl, 
pyrenylmethylglycerol moieties, all of which may be optionally substituted. Other desirable 
5 universal bases include, pyrrole, diazole or triazole moieties, all of which may be optionally 
substituted, and other groups e.g. modified adenine, cytosine, 5-methylcytosine, isocytosine, 
pseudoisocytosine, guanine, thymine, uracil, 5-bromouracil, 5-propynyluracil, 5-propyny-6- 
fluorol uracil, 5-methylthiazoleuracil, 6-aminopurine, 2-aminopurine, inosine, diaminopurine, 7- 
propyne-7-deazaadenine, 7-propyne-7-deazaguanine. R 1 , R 2 or R 2 , R 3 or R 3 , R 5 and R 5 ' are 
10 hydrogen, methyl, ethyl, propyl, propynyl, aminoalkyl, methoxy, propoxy, methoxy-ethoxy, 
fluoro, or chloro. 

P designates the radical position for an internucleoside linkage to a succeeding 
monomer, or a 5-terminal group, R 3 or R 3 is an internucleoside linkage to a preceding 
monomer, or a S'-terminal group. The internucleotide linkage may be a phosphate, 

1 5 phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, 
phosphorodiselenoate, alkylphosphotriester, or methyl phosphonate. The internucleotide 
linkage may also contain non-phosphorous linkers, hydroxylamine derivatives (e.g. -CH 2 - 
NCH3-O-CH2-), hydrazine derivatives, e.g. -CH2-NCH3-NCH3-CH2, amid derivatives, e.g. - 
CH 2 - CO-NH-CH2-, CH2-NH-CO-CH2-. In la, R 4 and R 2 together designate -CH 2 -0-, -CH 2 -S-, 

20 -CH 2 -NH-,-CH 2 -NMe-, -CH 2 -CH 2 -0-, -CH 2 -CH 2 -S-, -CH 2 -CH 2 -NH-, or -CH 2 -CH2-NMe- 
where the oxygen, sulfur or nitrogen, respectively, is attached to the 2-position. In Formula lb, 
R 4 and R 2 together designate -CH 2 -CK -CH 2 -S-, CH 2 -NH-, -CH 2 -NMe-, -CH2-CH2-O-, -CH 2 - 
CH 2 -S-, -CH2-CH2-NH-, or -CH 2 -CH 2 -NMe- where the oxygen, sulphur or nitrogen, 
respectively, is attached to the 2-position (R 2 configuration). 

25 Desirable LNA monomer structures are structures in which X is oxygen (Formula la and 

lb); B is a universal base such as pyrene; R 1 , R 2 or R 2 ', R 3 or R 3 , R 5 and R 5 ' are hydrogen; P is a 
phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, and methyl phosphornates; 
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R 3 or R 3 is an intemucleoside linkage to a preceding monomer, or a S'-terminal group. In 
Formula la, R 4 and R 2 ' together designate -CH 2 -0-, -CH 2 -S-, -CH 2 -NH-, -CH 2 -NMe-, -CH 2 - 
CH 2 0, -CH 2 -CH 2 -S-, -CH2-CH2-NH-, or -CH 2 -CH 2 -NMe- where the oxygen, sulphur or 
nitrogen, respectively, is attached to the 2'-position, and in Formula lb, R 4 and R 2 together 
5 designate -CH 2 -0-, -CH 2 -S-, -CH 2 -NH-,-CH 2 -NMe- f -CH 2 -CH 2 -0-, -CH 2 -CH 2 -S-, -CH 2 -CH 2 - 
NH-, or -CH 2 -CH 2 -NMe- where the oxygen, sulphur or nitrogen, respectively, is attached to the 
2 -position in the R 2 configuration. 

Particularly desirable LNA monomer for incorporation into an oligonucleotide of the invention 
10 include those of the following formula Ila 



wherein X oxygen, sulfur, nitrogen, substituted nitrogen, carbon and substituted carbon, and 

15 desirably is oxygen; B is a modified base as discussed above e.g. an optionally substituted 
carbocyclic aryl such as optionally substituted pyrene or optionally substituted 
pyrenylmethylglycerol, or an optionally substituted heteroalicylic or optionally substituted 
heteroaromatic such as optionally substituted pyridyloxazole. Other desirable universal bases 
include, pyrrole, diazole or triazole moieties, all of which may be optionally substituted; R*\ 

20 R 2 , R 3 , R 5 and R 5 * are hydrogen; P designates the radical position for an intemucleoside linkage 
to a succeeding monomer, or a 5'- terminal group, R 3 * is an intemucleoside linkage to a 
preceding monomer, or a 3-terminal group; and R 2 * and R 4 * together designate -0-CH 2 - or - 
CH 2 -CH 2 -0- where the oxygen is attached in the 2'-position, or a linkage of -(CH 2 ) n - where n is 
2, 3 or 4, desirably 2, or a linkage of -S-CH 2 - or -NH~CH 2 -. 

25 LNA units of formula Ila where R 2 * and R 4 " contain oxygen are sometimes referred to 

herein as "oxy-LNA"; units of formula Ila where R 2 * and R 4 " contain sulfur are sometimes 
referred to herein as "thio-LNA"; and units of formula Ila where R 2 * and R 4 * contain nitrogen 
are sometimes referred to herein as M amino-LNA M . For many applications, oxy-LNA units are 
desirable modified nucleic acid units of oligonucleotides of the invention. 

30 Particularly desirable LNA monomers for use in oligonucleotides of the invention are 2- 

deoxyribonucleotides, ribonucleotides, and analogues thereof that are modified at the 2- 
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position in the ribose, such as 2'O-methyl, 2'-fluoro, 2'-trifluoromethyl, 2'-0-(2- 
methoxyethyl), 2'-0-aminopropyl, 2M3-dimethylamino-oxyethyl, 2'-0-fluoroethyl or 2'-0- 
propenyl, and analogues wherein the modification involves both the 2'and 3' position, desirably 
such analogues wherein the modifications links the 2 - and 3'-position in the ribose, such as 
5 those described in Nielsen et al., J. Chem. Soc M Perkin Trans. 1, 1997, 3423-33, and in WO 
99/14226, and analogues wherein the modification involves both the 2'- and 4-position, 
desirably such analogues wherein the modifications links the 2 - and 4'-position in the ribose, 
such as analogues having a -CH 2 -S- or a -CH 2 -NH- or a -CH 2 -NMe- bridge (see Singh et al. J. 
Org. Chem. 1998, 6, 6078-9). Although LNA monomers having the P-D-ribo configuration are 

10 often the most applicable, other configurations also are suitable for purposes of the invention. 
Of particular use are a-L-ribo, the P-D-xylo and the oc-L-xylo configurations (see Beier et al., 
Science, 1999, 283, 699 and Eschenmoser, Science, 1999, 284, 2118), in particular those 
having a 2'-4* -CH 2 -S-, -CH 2 -NH-, -CH 2 -0- or -CH 2 -NMe- bridge. 

In another desirable embodiment, LNA modified oligonucleotides used in this invention 

15 comprises oligonucleotides containing at least one LNA monomeric unit of the general scheme 
A above, wherein X, B, P are defined as above. One of the substituents R 2 , R 2 \ R 3 , and R 3 * is 
a group P* which designates an internucleoside linkage to a preceding monomer, or a 273'- 
terminal group. Two of the substituents of R'\ R 2 , R 2 \ R 3 , R 4 *, R 5 , R 5 \ R 6 , R 6 *, R 7 , and R 7 * 
when taken together designate a biradical structure selected from -(CR*R*) r -M-(CR*R*) s -, 

20 -(CR*R*) r -M-(CR *R *) s -M~, -M-(CR V) r+S -M-, -M-(CR*R VM-(CRV) s -, ~(CR*R*W, -M-, - 
M-M-, wherein each M is independently selected from -O-, -S-, -Si(R*) 2 -, -N(R*)-, >C=0, - 
C(=0)-N(R')-, and -N(R*)-C(=0)-. Each R*and R^'^-R 707 **, which are not involved in the 
biradical, are independently selected from hydrogen, halogen, azido, cyano, nitro, hydroxy, 
mercapto, amino, mono- or di(Ci-6-alkyl)amino, optionally substituted Ci^-alkoxy, optionally 

25 substituted Ci-6-alkyl, DNA intercalators, photochemical ly active groups, thermochemically 
active groups, chelating groups, reporter groups, and ligands, and/or two adjacent (non- 
geminal) R* may together designate a double bond, and each of r and s is 0-4 with the proviso 
that the sum r+s is 1-5. 

Examples of LNA units are shown scheme B: 

30 
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Scheme B 

wherein the groups, X and B are defined as above. P designates the radical position for an 
5 internucleoside linkage to a succeeding monomer, nucleoside such as an L-nucleoside, or a 5'- 
terminal group, such internucleoside linkage or 5-terminal group optionally including the 
substituent R 5 . One of the substituents R 2 , R 2 *, R 3 f and R 3 * is a group P* which designates an 
internucleoside linkage to a preceding monomer, or a 273'-terminal group. 

Desirable nucleosides are L-nucleosides such as for example, derived dinucleoside 

10 monophosphates. The nucleoside can be comprised of either a beta-D, a beta-L or an alpha-L 
nucleoside. Desirable nucleosides may be linked as dimers wherein at least one of the 
nucleosides is a beta-L or alpha-L. B may also designate the pyrimidine bases cytosine, 5- 
methyl-cytosine. thymine, uracil, or 5-fluorouridine (5-FUdR) other 5-halo compounds, or the 
purine bases, adenosine, guanosine or inosine. 

15 As discussed above, a variety of LNA units may be employed in the monomers and 

oligomers of the invention including bicyclic and tricyclic DNA or RNA having a 2'-4' or 2 , -3 l 
sugar linkages; T-O^'-C-methylene-P-D-ribofuranosyl moiety, known to adopt a locked C3'- 
endo RNA-like furanose conformation. Illustrative modified structures that may be included in 
oligonucleotides of the invention are shown in Figure 1. Other nucleic acid units that may be 

20 included in an oligonucleotide of the invention may comprise 2'-deoxy-2 , -fluoro 
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ribonucleotides; Z-O-methyl ribonucleotides; 2 , -0-methoxyethyl ribonucleotides; peptide 
nucleic acids; 5-propynyl pyrimidine ribonucleotides; 7-deazapurine ribonucleotides; 2,6- 
diaminopurine ribonucleotides; and 2-thio-pyrimidine ribonucleotides, and nucleotides with 
other sugar groups (e.g. xylose). 
5 Oligonucleotides containing LNA are readily synthesized by standard phosphoramidite 

chemistry. The flexibility of the phosphoramidite synthesis approach further facilitates the easy 
production of LNA oiigos carrying all types of standard linkers, fluorophores and reporter 
groups. 

10 Example 18: Selective Binding Complementary (SBO nucleotides 

Selective Binding Complementary (SBC) nucleotides are unable to form stable hybrids 
with each other, yet are able to form stable, sequence-specific hybrids with complementary 
unmodified strands of nucleic acids. Thus, the reduced ability of SBC oligonucleotides to form 
intramolecular hydrogen bond base-pairs between regions of substantially complementary 

15 sequence causes a reduced level of secondary structure. Self-complementarity is an important 
issue in nucleic acid technologies as reported for DNA, PNA and LNA, and in different 
biological applications especially in the field of homogeneous assays. LNA:LNA duplexes are 
the most thermally stable nucleic acid type duplex system known, making the reduction of self- 
complementarity even more important. 

20 Exemplary SBC oligonucleotides contain 2-amino-A (D) and 2S T incorporated in the 

same oligonucleotide as replacements of A and T, respectively. The SBC name refers to the 
fact that D and 2S T form a destabilised base-pair compared to the A-T base-pair, but D-T and 
2S T-A base-pairs are normally more stable than the original A-T base-pair. Exemplary SBC-G 
nucleotides include inosine or LNA-inosine, and exemplary SBC-C nucleotides inclue 

25 PyrroloPyr, LNA- PyrroloPyr, ^C, and LNA-^C (Fig. 4). Other exemplary SBC nucleotides 
are shown in Figs. 2 and 4-8. If desired, SBC nucleotides may be incorporated into the nucleic 
acids and arrays of the invention, using standard methods. 

The systems disclosed herein can provide significant nucleic acid probes for universal 
hybridization. In particular, universal hybridization can be accomplished with a 

30 conformational^ restricted monomer, including a desirable pyrene LNA monomer. Universal 
hybridization behavior also can be accomplished in an RNA context. Additionally, the binding 
affinity of probes for universal hybridization can be increased by the introduction of high 
affinity monomers without compromising the base-pairing selectivity of bases neighboring the 
universal base. 
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Incorporation of one or more modified nucleobases or nucleosidic bases into an 
oligonucleotide can provide significant advantages. Among other things, LNA oligonucleotides 
can often self-hybridize, rather than hybridize to another oligonucleotide. Use of one or more 
modified bases with the LNA units can modulate the propensity of the oligonucleotide to form 
5 double stranded structures with other oligonucleotides containing modified nucleobases 
including internal duplex formation, thereby inhibiting undesired self-hybridization. 

Example 19: Exemplary Methods for Synthesizing LNA-2-thiopyrimidine Nucleosides and 
Nucleotides 

10 2-Thiopyrimidine nucleosides can be prepared in several ways as described below. For 

example, the 2-thiouridine-nucIeosides (IV) can be synthesized from a substituted uridine 
nucleoside (VIII) as described in the scheme below. By protection of the 04-position (IX) on 
the nucleobase thionation can be performed, 02 position, which results in the 2-thio-uridine 
nucleoside (IV). Performing sulphurisation on both 02 and 04 results in 2,4-dithio-uridine 

15 nucleoside (X) which may be transformed into the 2-thio-uridine nucleoside (IV) (Saladino, et 
al., Tetrahedron, 1996, 52, 6759). Another way is to generate a cyclic ether (XI) through 
reaction with the 5' position this product can then be transformed to the 2-thio-uridine 
nucleoside (TV) or the 2-O-alkyl-uridine nucleoside (XII). The 2-O-alkyl-uridine nucleoside 
(XII) can also be generated by direct alkylation of the uridine nucleoside (VIII). Treatment of 

20 the 2-O-alkyl-uridine nucleoside (XII) can also be transformed into the 2-thio-uridine 
nucleoside (Brown et. al., J. Chem. Soc. 1957, 868; Singer, et. al, Proc. Natl. Acad. Sci. USA, 
1983, 80, 4884; Rajurand McLaughlin, Tetrahedron Lett., 1992, 33, 6081). 
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IV 


In another method, lewis acid-catalyzed condensation of a properly substituted sugar (I) 
and a substituted 2-thio-uracil (II) can result in a substituted 2-thio-uridine nucleoside of the 
5 structure (III) which by further synthetic manipulations can be transformed into the LNA 2- 
thiouridine nucleoside (IV) (Hamamura et. al., Moffatt, J. Med. Chem., 1972, 15, 1061; Bretner 
et. aL, J. Med. Chem., 1993, 36, 3611). 


O 
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Using a properly substituted amino-sugar (V), a 2-thio-uridine nucleoside can be 
synthesized through ring-synthesis of the nucleobase by reaction of the amino sugar (V) and an 
substituted isothiocyanate(VI), yielding the substituted LNA 2-thio-uracil nucleoside (VI) 
5 (Shaw and Warrener, J. Chem. Soc. 1957, 153; Cusack et aL t J. Chem. Soc. Perkin /, 1973, 
1721). 



vn 


10 Example 20: Exemplary Methods for Synthesizing ^-LNA 

Three different strategies for synthesis of 2s T-LNA are outlined in the Summary of the 
Invention section. Strategy A involves coupling a glycosyl-donor and a nucleobase, using 
standard methodology for synthesis of existing LNA monomers. Strategy B involves ring 
synthesis of the nucleobase. This strategy is desirable because the availability of 1-amino-LNA 
15 enables introduction of a variety of new nucleobases. Strategy C includes modification of T- 
LNA; the easy synthesis of LNA-T diol makes this an attractive pathway. 

In a desirable embodiment, ^T-LNA is synthesized as illustrated in the scheme below. 
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4 

In particular, the known coupling sugar l,2-di-0-acetyl-3, 5 di-0-benzyl, 4-C- 
mesyloxymethyl, a,J}-D-ribofuranose 1 was coupled with the nucleobase 2-thio-thymidine in a 
5 Vorbriiggen type reaction. Thus, the nucleobase was silyilated and condensed with the sugar 
using SnCU as catalyst to promote the reaction affording nucleoside 2. Mass spectrometry and 
NMR subsequently identified the isolated product as the desired one. NMR data were compared 
with published data of a 2-thio-thymindine derivative (Kuimelis and Nambiar, Nucleic Acid 
Res., 1994, 22, 1429-1436) in order to validate the correct attachment point of the nucleobase. 

10 Subsequently, a base mediated ring-closing reaction afforded the di-benzylated LNA 

derivative 3 in 77% yield. The signals in the ! H-NMR spectrum of the compound appeared as 
singlets, thus proving that the cyclization had occurred to give the LNA skeleton, in which the 
r-H and 2'-H are perpendicular to each other causing the 3 J|\ 2 - to be 0 Hz. MALD1 mass 
spectrometry was likewise used for the identification of the compound. 

15 The LNA derivative was protected at the nucleobase with the toluoyl protective group to 

give 4. This group is well known for the protection of 2-thio-thymidine derivatives, (Kuimelis 
and Nambiar, Nucleic Acid Res., 1994, 22, 1429-1436). The protection of the nucleobase 
occurs at both the N-3 and the 0-4 position and hence the compound is isolated as a mixture of 
two compounds. NMR shows that the ratio of the two isomers in the isolated mixture is 2: 1. 

20 These methods are described further below. 

l-(2-q-acetvl-3-Q. 5-O-dibenzvL 4-C-mesvloxvmethvl-p-D-ribofuranosvn-2-thio-thymine (2) 
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1, 2-di-0-acetyl-3, 5 di-0-dibenzyl, 4-C-mesyloxymethyl, Ot,p-D-ribofuranose (1, 2.0g, 
3.83 mmol) and 2-ihio-thymine (552mg, 3.89mmol) were co-evaporated with anhydrous 
acetonitrile (100 ml) and redissolved in anhydrous acetonitrile (80ml), N.O- 
bistrimethylsilylacetamide (1.5, 5.85rnmol) was added, and the reaction was stirred at 80°C for 
5 one hour. The mixture was cooled to 0°C, SnCl 4 (0.9 ml, 7.66mmol) was added, and the 
reaction was left to stir for 24 hours. The reaction mixture was diluted with EtOAc and washed 
with NaHCC>3 and subsequently with water. The organic phase was dried (Na2S0 4 ) and 
evaporated to dryness. The product was purified using column chromatography, giving the thio- 
thymidine derivative 2 (l.lg, 1.82mmol, 40%) as a white foam. Rf (10% 
10 THF/dichloromethane): 0.75. 

MALDI-MS: 627 (M+Na) I3 C-NMR (CDC1 3 ): 8= 174.40, 169.29, 159.89, 136.13, 136.51, 
136.05, 128.62, 128.56, 128.41, 128.29, 128.07, 127.89, 12767, 116.18, 91.41, 86.21, 75.59, 
75.31, 74.46, 74.22, 73.61, 69.25, 69.04, 37.52, 20.62, 1 1.91 

15 (i/?.3/?.4fr7SV7-(benzvloxvVl-rb^ 
dioxabicvclor2.2. 1 Iheptane (3^ 

l-(2-0-acetyl-3-0, 5-0-dibenzyl, 4-C-mesyloxymethyl-P-D-ribofuranosyl)-2-thio- 
thymine (2, 630mg, 1.04mmol) was dissolved in dioxane (15ml) and water (8ml), and aqueous 
NaOH (2M, 5ml) was added, and the reaction was left to stir at room temperature for one hour. 

20 The yellow solution was neutralized with HC1 (1M, 6mJ) affording a precipitation. The mixture 
was diluted with dichlorome thane and ethyl acetate causing an emulsion. After separation, the 
aqueous phase extracted with ethyl acetate, and the combined organic phase was dried 
(Na2SC>4) and evaporated to dryness. The compound was purified by column chromatography 
(0-2, then 5% THF/dichloromethane), giving the ring closed compound 3 as a white foam 

25 (370mg, 0.79mmol, 77%). R f (2% MeOH/dichloromethane): 0.23. 

MALDI-MS: 488 (M+Na) l3 C-NMR (CDC1 3 ): 5= 173.14, 160.39, 137.20, 136.63, 136.00, 
128.46, 128.34, 128.02, 127.66, 115.52, 90.29, 87.77, 77.39,75.26, 73.77, 72.07, 71.70, 64.15, 
30.17, 12.33 

'H-NMR (CDCI3): 6= 9.87 (s, 1H), 7.69 (d, 1.1Hz, 1H), 7.26-7.37 (m, 10H), 6.13 (s, 1H), 4.84 
30 (s, 1H), 4.66 (d, J= 1 1.3 Hz, 1H), 4.61 (s, 2H), 4.52 (d, 1 1.5Hz, 1H), 4.04 (d, J=7.7Hz, 1H), 
3.93 (s, 1H), 3.88 (d, J= 11.0Hz, 1H), 3.82 (d, J= 7.7Hz, iH), 3.82 (d, J= 10.8 Hz, 1H), 1.59 (d, 
J= 1.1 Hz, 3H) 


LNA2I/SKA/MSL 


5/14/2003 


135 

(1R.3RAR. 7SV7-(benzvloxvV 1 -(benzvlox vmethvlV3-(2-thio-N3/Q4-toluovl- thvmidineV2.5- 
dioxabicvclor2.2.nheptane (4) 

(1R.3RAR, 7S)-7-(benzyloxy)- 1 -(benzyloxymethyl)-3-(2-thiothymidine)-2,5- 
dioxabicyclo[2.2,l]heptane (3, 290mg, 0.62mmol) was dissolved in anhydrous pyridine and 
5 diisopropylethylamine (0.2ml, l.lSmmol), toluoyl chloride (0.25ml, 1.89 mmol) was added, 
and the reaction mixture was stirred at room temperature for three hours. After completion, the 
reaction mixture was diluted with dichloromethane, and the reaction was quenched by addition 
of water. The phases were separated, and the organic phase was dried (Na2S04> and evaporate 
to dryness. The residue was co-evaporated with toluene. The product was purified by column 
10 chromatography (0-1% MeOH/dichloromethane) to give nucleoside 4 as a white foam (320 mg, 
0.55mmol, 89%). R f (2%MeOH/dichlorome thane): 0.78. 

MALDI-MS: 606 (M+Na) l3 C-NMR (CDCI3): 8=171.98, 168.30, 160.30, 145.92, 145.82, 
137.22, 136.65, 135.98, 130.39, 130.27, 129.85, 129.50, 128.51, 128.41, 128.08, 127.73, 
115.11, 90.10, 87.81, 76.01, 75.80, 75.39, 75.01, 73.83, 72.19, 72.09, 71.74, 64.15, 21.75, 
15 12.40. 

Example 21: Exemplary Methods for Synthesizing LNA-L LNA-D. and LNA-2AP 

T-O t 4'-C-methylene linked (LNA) nucleosides containing hypoxanthine (or inosine) 
(LNA-T), 2,6-diaminopurine (LNA-D), and 2-aminopurine (LNA-2AP) nucleobases were 
20 efficiently prepared via convergent syntheses. The nucleosides were converted into 
phosphoramidite monomers and incorporated into LNA oligonucleotides using an automated 
phosphoramidite method. The complexing properties of oligonucleotides containing these 
LNA nucleosides were assessed against perfect and singly mismatch DNA. 



25 LNA-I 
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LNA-D 



LNA-2AP 

5 

Hypoxantine, the base found in the nucleotides inosine and deoxyinosine, is considered as a 
guanine analogue in nucleic acids. 

Oligonucleotides containing 2,6-diaminopurine replacements for adenines are expected 
to bind more strongly to their complementary sequences especially as part of A-type helixes 

10 due to the potential formation of three hydrogen bounds with thymine or uracil. The reported 
effect of 2,6-diaminopurine deoxyriboside (D) on the stability of polynucleotide duplexes 
reaches, on average, about 1.5 °C per modification. Higher stabilization effects for mismatches 
were observed for D nucleosides involved in formation of duplexes prone to form A-type 
helixes. LNA D and LNA 2'-OMe-D are expected to have increased stabilization and 

15 mismatch discrimination. LNA can be used in combination with 2-thio-T for construction of 
selectively binding complementary oligonucleotides. Taking into consideration the extremely 
high stability of LNA:LNA duplexes, this approach might be very useful for constructing of 
LNA containing capture probes and antisense reagents including drugs(Figs. 3 and 10). 

2-Aminopurine (2-AP) is a fluorescent nucleobase (emission at 363 mn), which is useful 

20 for probing nucleic acids structure and dynamics and for hybridizing with thymine in Watson- 
crick geometry. LNA-I, LNA-D, and/or LNA-2AP may be used in the nucleic acids of the 
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present invention, e.g., to increase the priming efficiency of DNA oligonucleotides in PCR 
experiments and to construct selectively binding complementary agents. 
Synthesis of LNA-1 

The convergent method adopted for preparation of LNA monomers (Koshkin et al, J. 
5 Org. Chem. 66:8504, 2001) was successfully applied for syntheses of the modified LNA 
nucleotides 1-3. The synthetic route to LNA-I phosphoramidite 11 is depicted in the scheme 
below. The previously described 4-C-branched furanose 4 (Koshkin et al, supra) was used as a 
glycosyl donor in coupling reaction with silylated hypoxantine by the method of Vorbriiggen et 
al (Vorbriiggen et al, Chem. Ber. 114:1234, 1981; Vorbriiggen et al, Chem. Ber. 114:1256, 

10 1981; and Vorbriigen, Acta Biochim. Pol., 43:25, 1996). The reaction resulted in high yield 
formation of desired P -configurated nucleoside derivative 5. However, analogous to the 
coupling reaction of 4 with protected guanines, the formation of undesired Af-7 isomer (ratio of 
N-9/N-7 = 4:1) was also detected. The mixture of the isomers was used for the ring closing 
reaction and protected LNA nucleoside 6 was isolated in 68 % yield as a crystalline compound. 

15 The correct structure of the isolated isomer was confirmed later by chemical conversion of 
LNA-I into LNA- A nucleoside (vide infra). Deprotection of the 5'-hydroxy group of 6 was 
accomplished via two-step procedure developed for the syntheses of other LNA nucleosides ( 
Koshkin et al., supra). First, 5'-0-mesy] group was displaced by sodium benzoate to produce 
nucleoside 7. The latter was converted into 5-hydroxy derivative 8 after saponification of the 

20 5 -benzoate. Direct removal of the 3-0-benzyl group from compound 8 was unsuccessful 
under the conditions tested due to a solubility problem. Therefore, compound 8 was converted 
to DMT-protected nucleoside 9 prior to catalytic debenzylation of the 3-0-hydroxy group. The 
phosphoramidite 11 was finally afforded via standard phosphitylation (McBride et al, 
Tetrahedron Lett. 24:245, 1983; Sinha et al, Tetrahedron Lett. 24:5843, 1983; and Sinha et al, 

25 Nucleic Acids Res. 12:4539, 1984) of the nucleoside 10. In order to verify the correct 
orientation of the glycoside bond (N-9 isomer) in synthesized LNA-I nucleoside, compound 7 
was successfully converted into the known LNA-A derivative 13 (Koshkin et al, supra) 
(Scheme 2). Thus, a treatment of 7 with phosphoryl chloride according to the procedure 
reported by Martin (Helv. Chim. Acta 78:486, 1995) resulted in a high yield formation of 6- 

30 chloropurine derivative 12. The adenosine derivative 13 was derived from 12 after reaction 
with ammonia. 
Exemplary Analytical Data 

Data for compound 8 includes the following: mp 302-305 °C (dec). ! H NMR (DMSO- 
d b ): 58.16, (s, 1H), 8.06 (s, 1H), 7.30-7.20 (m, 5H), 5.95 (s, 1H), 4.69 (s, 1H), 4.63 (s, 2H), 4.28 

35 (s, 1H),3.95 (d,7 = 7.7, 1H), 3.83 (m, 3H). I3 C NMR (DMSO-rf 6 ): 8 156.6, 147.3, 146.1, 137.9, 
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137.3, 128.3, 127.6, 127.5, 124.5, 88.2, 85.4, 77.0, 72.1, 71.3, 56.7. MALDI-MS m/z: (M+H) + . 
Anal. Calcd for C| 8 Hi8N 4 0 5 5/12 H 2 0: C, 57,21; H, 5.02; N, 14.82. Found: C, 57,47; H, 4.95; 
N, 14.17. 

Analysis of compound 11 indicated that 3I P NMR (DMSO-<f 6 ): & 148.90. 
5 Scheme for Synthesis of LN A-I a 



a Keys: (a) hypoxantine, BSA, TMSOTf, 1 ,2-dichloromethane; 93%; (b) NaOH, 
THF, EtOH, H 2 0; 69%; (c) NaOBz, DMSO; 76%; (d) NaOH, THF, MeOH, H 2 0; 
25 85%; (e) DMT-C1, pyridine; 92%; (f) Pd/C, HCO2NH4; 77%; (g) 2-cyanoethyl-Ar,Af- 
diisopropyl-phosphoramidochloridite, DIPEA, DMF; 75%. 

Exemplary Experimental Conditions 

(1R.3R.4R. 75)-7~f2-Cvanoethoxvfdiisopropvlamino)phosphinoxvVl-f4 > 4 t - 
30 dimethox vtritvloxvrnethvl)-3-fhyroxanthin-9-vI)-dioxabicvclor2.2. 1 Iheptane (11) 

Compound 10 (530 mg, 0.90 mmol, described previously, (sec for example, WO 

00/56746) was dissolved in anhydrous EtOAc (5 mL) and cooled in an ice-bath. DIPEA 

(0.47 mL, 2.7 mmol) and (250 p,L, 1.1 mmol) were added under intensive stirring. 

Formation of insoluble material was observed, and CH2CI2 (3 mL) was added to produce 
35 a clear solution. More 2-cyanoethyl-MA^-diisopropylphosphoramidochloridite (200 (iL, 
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0.88 mmol) was added after one hour, and the mixture was stirred overnight. EtOAc (30 
mL) was added, the mixture was washed with sat. NaHCC>3 (2 x 50 mL), brine (50 mL), 
dried (NaoSCXO, and concentrated to a solid residue. Purification by silica gel HPLC (1-5 
%MeOH/CH 2 Cl2 v/v, containing 0.1% of pyridine) gave compound 11 (495 mg, 75%) as 
5 a white solid material. 31 P NMR (DMSO-d 6 ): 5 148.90. 
Synthesis ofLNA-D 

Taking advantage of a high availability of the natural deoxy- and riboguanosines, a 
number of effective methods were developed for their conversion into 2,6-diaminopurine (D) 
nucleosides (Fathi et aL, Tetrahedron Lett. 31:319, 1990; Gryaznov et al. y Tetrahedron Lett., 
10 35:2489, 1994; and Lakshman et aL, Org. Lett., 2:927, 2000). However, the production of 
LNA-G nucleoside is a multi-step synthetic procedure. 


Scheme for Synthesis of LNA-G 



For the synthesis of LNA-D nucleoside, a novel synthesis method was developed that 
employed a common convergent scheme, related to the strategy used earlier for the synthesis of 
its anhydrohexitol counterpart (Boudou et aL, Nucleic Acids Res. 27:1450, 1999). In 
particular, a properly protected carbohydride unit was conjugated with 6-chloro-2-aminopurine 
25 to give a stable 6-chloro intermediate derivative (scheme below) which was further converted 
into desired diaminopurine nucleoside. 


a 


30 



sugar sugar sugar donor 


35 
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Thus, it was shown that glycosylation of 2-chloro-6-aminopurine with compound 4 resulted in 
highly stereoselective formation of the nucleoside derivative 14. To promote the ring closing 
5 reaction, a solution of 14 in aqueous 1,4-dioxane was treated with 10-fold excess of sodium 

Synthesis of LNA-D a 



,P-0(CH 2 ) 2 CN 


30 a Keys: (a) 2-chloro-6-aminopurine, BSA, TMSOTf, 1 ,2-dichloromethane; 90 %; 
(b) NaOH, 1,4-dioxane, H 2 0; 87%; (c) NaOBz, DMF; (d) NaN 3 , DMSO; (e) NaOH, 
EtOH; 79% (three steps); (f) 10% Pd/C, HCO2NH4, MeOH, H 2 0; 84%; (g) 1. BzCl, 
pyridine; 2. NaOH, EtOH, pyridine; 62%; (h) DMT-C1, pyridine; 80%; (i) 2-cyanoethyl- 
W,A^diisopropylphosphoramidochloridite, DIPEA, DMF; 74%. 
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hydroxide to give bicyclic compound IS in 87% yield. The standard reaction with sodium 
benzoate in hot DMF was then successfully applied for displacement of 5-mesylate of IS. 
Notably, this reaction proceeded in very selective manner and no side products originating from 
the modification of the nucleobase were detected. The desired compound 16 was precipitated 
5 from the reaction mixture after addition of water. In order to introduce the 6-amino group into 
nucleobase structure, intermediate 6-azido derivative 17 was synthesized via reaction of 16 with 
sodium azide. The nucleoside derivative 18 was isolated as a crystalline compound after 
saponification of the S'-benzoate of 17. Subsequent catalytic hydrogenation of 18 on palladium 
hydroxide resulted in simultaneous reduction of 6-azido and 3-benzyl groups to give LNA-D 

10 diol 19 after crystallization from water. By the use of peracelation method, 2- and 6-amino 
groups of 19 were benzoylated at the next step to give the nucleobase protected derivative 20, 
which was in the standard way further converted into phosphoramidite monomer 21. 
This phosphoramidite has been produced in a quantity of 0.5 grams. 
Exemplary Analytical Data 

15 Data for compound 19 includes the following: *H NMR (DMSO-cfc): 8 7.81 (s, 1H), 

6.78 (br s, 2H), 5.91 (br s, 2H), 5.71 (s, 1H), 5.66 (br s, 1H), 5.04 (br s, 1H), 4.31 (s, 1H), 4.20 
(s, 1H), 3.90 (d,7 = 7.7 Hz, 1H), 3.77 (m, 2H), 3.73 (d,7 = 7.7 Hz, 1H). 13 CNMR(DMSO-rf 6 ): 
6 160.5, 156.2, 150.9, 134.2, 113.4, 88.3, 85.0, 79.3, 71.5, 70.0, 56.8. MALDI-MS m/v 295.0 
(M+H) + . Anal. Calcd for CnHi4N 6 0 4 '1.5 H 2 0: C, 41,12; H, 5.33; N, 26.15. Found: C, 41.24; H, 

20 5.19; N, 25.80. 

The 3, P NMR (DMSO^) spectrum for compound 24 contained signals at 8 149.19 and 

148.98. 

Data for compound 23 includes the following: crystallized from MeOH. mp. 227.5-229 
°C (dec). *H NMR (DMSO-do): 8 8.60 (s, 1H), 8.15 (s, 1H), 6.64 (br s, 2H), 5.82 (s, 1H), 5.71 
25 (br s, 1H), 5.04 (br s, 1H), 4.40 (s, 1H), 4.21 (s, 1H), 3.92 (d, J = 7.7 Hz, 1H), 3.79 (m, 2H), 
3.75 (d, J = 7.7 Hz, 1H). ,3 C NMR(DMSO-</ 6 ): 5 160.6, 152.0, 149.4, 139.3, 127.1, 88.6, 84.8, 
79.1, 71.6, 70.2, 56.8. MALDI-MS m/z: 334.7 (M+H) + . 

For protected compound 23, the 3l P NMR (DMSO-cfe) spectrum has a signal at 148.93 
and 148.85. 
30 Exemplary Experimental Conditions 

f7£3/?.4/?.7SV3-(2-amino-6-chloropurin-9-YlV^ 
dioxabicvclor2.2.11heptane (15) 

To a solution of compound 14 (40 g, 64.5 mmol) in 1,4-dioxane (300 mL) was added 1 
M NaOH (350 mL). The mixture was stirred for one hour at 0 °C, neutralized with AcOH (40 
35 mL), and washed with CH 2 C1 2 (2 x 200 mL). The combined organic layers were dried (Na 2 S04) 
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and concentrated under reduced pressure. The solid residue was purified by silica gel flash 
chromatography to give compound 15 (27.1 g, 87%) as a white solid material. l H NMR 
(CDC1 3 ): 57.84($, 1H), 7.32-7.26 (m, 5H), 5.91 (s, 1H),4.73 (s, 1H),4.66 (d, J = 11.7 Hz, 1H), 
4.61 (d,7= 11.7 Hz, 1H),4.59 (s, 2H), 4.31 (s, 1H), 4.18 (d, J = 8.0 Hz, 2H), 3.99 (d, 7=7.9 
5 Hz, 1H), 3.05 (s, 3H). n C NMR (CDC1 3 ) 8 158.9, 152.2, 151.4, 139.1. 136.4, 128.4, 128.2, 

127.7, 125.3, 86.5, 85.2, 77.2, 76.8, 72.4, 72.1, 64.0, 37.7. MALDI-MS m/z 482.1 [M+H]*. 

r7S.3/?.4/g.75V3-f2-amino-6-chloropurin-9-vl)-l-benzoyloxymethvl-7-benzvloxv-2,5- 
dioxabicvclof2.2. llheptane ( 1 6) 

10 A mixture of sodium benzoate (7.78 g, 54 mmol) and compound 15 13 g, 27 mmol) was 

suspended in anhydrous DMF (150 mL) and stirred for two hours at 105 °C. Ice-cold water (500 
mL) was added to the solution under intensive stirring. The precipitate was filtered off, washed 
with water, and dried in vacuo. The intermediate product 16 (8 g) was used for ext step without 
further purification. Analytical sample was additionally purified by silica gel HPLC (0-2% 

15 MeOH/CH 2 Cl 2 v/v). l H NMR (CDC1 3 ) 5 7.98-7.95 (m, 2H), 7.79 (s, 1H), 7.62-7.58 (m, 1H), 
7.48-7.44 (m, 2H) t 7.24 (m, 5H), 5.93 (s, 1H), 4.80 (d, /= 12.6 Hz, 1H), 4.77 (s, 1H), 4.67 (d, J 
= 1 1.9 Hz, 1H), 4.65 (d, J = 12.6 Hz, 1H), 4.56 (d, J = 1 1.9 Hz, 1H), 4.27 (d, J = 8.0 Hz, 1H). 
4.25 (s, 1H), 4.08 (d, J = 7.9 Hz, 1H). 13 C NMR (CDC1 3 ) 5 165.7, 158.8, 152.1, 151.3, 138.9, 
136.4, 133.4, 129.4, 129.0, 128.5, 128.4, 128.2, 127.6, 125.4, 86.4, 85.7, 77.2, 76.7, 72.5, 72.3, 

20 59.5. MALDI-MS m/z 508.0 [M+H] + . 

( IS.3R.4R. 7SV3-( 2-amino-6-azidopurin-9-vl V7-benzvloxv- 1-h vdroxvmethvl-2.5- 
dioxabicvclor2.2. 1 Iheptane ( 1 8) 

AH the amount of compound 16 from the previous experiment was dissolved in 
anhydrous DMSO (100 mL) and NaN 3 (5.4 g, 83 mmol) was added. The mixture was stirred for 

25 two hours at 100 °C and cooled to room temperature. Water (400 ml) was added, and the 
mixture was stirred for 30 minutes at 0 °C (ice-bath) to give a yellowish precipitate 17. The 
precipitate was filtered off, washed with water, and dissolved in THF (25 mL). 2M NaOH (30 
mL) was then added to the solution, and after 15 minutes of stirring the mixture was neutralized 
with AcOH (4 mL). The mixture was concentrated to approximately 1/2 of its volume and 

30 cooled in an ice-bath. The titel compound was collected by filtration, washed with cold water, 
and dried in vacuo. Yield: 8.8 g (79% from 15). 'H NMR (DMSO-</ 6 ) 5 8.53 ( br s, 2H), 8.23 
(s, 1H), 7.31-7.26 (m, 5H), 6.00 (s, 1H), 5.26 (t, J = 5.7 Hz, 1H), 4.76 (s, 1H), 4.64 (s, 1H), 4.31 
(s, 1H), 3.99 (d, J = 7.9 Hz, 1H), 3.88-3.85 (m, 3H). ,3 C NMR (DMSO-d 6 ) 8 146.0, 144.0, 

143.8, 137.9, 137.0, 128.3, 127.7, 127.6, 112.3, 88.3, 85.6, 77.1, 77.0, 72.2, 71.4, 56.8. 
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MALDI-MS m/z 384.7 [M+H] + for 2,6-diaminopurine product, 410.5 [M+H] + . Anal. Calcd for 
C I8 H| 8 N 8 0 4 : C, 52.68; H, 4.42; N, 27.30. Found: C, 52.62; H, 4.36; N, 26.94. 
(;S.3fr4R7Sy3-(2.6-Diaminopurin-9^ 
dioxabicvclof2.2.11heptane (19) 
5 To a suspension of compound 18 (8 g, 19.5 mmol) in MeOH (100 mL) were added 

Pd(OH) 2 /C (20%, 5.5 g) and HCO2NH4 (3g). The mixture was refluxed for 30 minutes and 
more HCO2NH4 (3g) was added. After refluxing for further 30 minutes, the catalyst was filtered 
off and washed with boiling MeOH/H 2 0 (1/1 v/v, 200 mL). The combined filtrates were 
concentrated to approximately 100 mL and cooled in an ice-bath. The precipitate was filtered 

10 off, washed with ice-cold H 2 0 and dried in vacuo to give compound 19 (5.4 g, 94 %) as a white 
solid material. *H NMR (DMSO-d 6 ): 6 7.81 (s, 1H). 6.78 (br s, 2H), 5.91 (br s, 2H), 5.71 (s, 
1H), 5.66 (br s, 1H), 5.04 (br s, 1H), 4.31 (s, 1H), 4.20 (s, 1H), 3.90 (d, / = 7.7 Hz, 1H), 3.77 
(m, 2H), 3.73 (d. J = 7.7 Hz, 1H). l3 C NMR (DMSO-</ 6 ) « 160.5, 156.2, 150.9, 134.2, 1 13.4, 
88.3, 85.0, 79.3, 71.5, 70.0, 56.8. MALDI-MS m/z: 295.0 (M+H) + . Anal. Calcd for 

15 CiiHuN 6 CV1.5 H2O: C, 41,12; H, 5.33; N, 26.15. Found: C,41.24; H, 5.19; N, 25.80. 
f/S.3/?.4ft.7SV3-(2.6-Di-(/V-benzovl^ 
dioxabicvclor2.2.11heptane (20) 

A solution of compound 19 (0.5 g, 1.7 mmol) in anhydrous pyridine (20 mL) was 
cooled in an ice-bath and benzoyl chloride (1.5 mL, 12.9 mmol) was added under intensive 

20 stirring. The mixture was allowed to warm to room temperature and was stirred overnight. 
Ethanol (20 mL) and 2 M NaOH (20 mL) were added, and the mixture was stirred for an 
additional hour. EtOAc (75 mL) was added and the solution was washed with water (2 x 50 
mL). The combined aqueous layers were washed with CH2CI2 (2 x 50 mL). The combined 
organic phases were dried (Na 2 S04) and concentrated under reduced pressure to a solid residue. 

25 The residue was suspended in Et 2 0 (75 mL, under refluxing for 30 minutes) and cooled in an 
ice-bath. The product was collected by Filtration, washed with cold EtaO, and dried in vacuo to 
give compound 20 (530 mg, 62 %) as a slightly yellow solid material. 

(lR.3R,4R.7S)-3-(2.6-Di W-benzovlamino)purin-9'Vn>l-f4 > 4 , -dimethoxvtritvloxymethylV7- 
hvdroxv-2,5-dioxabicvclof2.2.11heptane (21) 

30 Compound 20 (530 mg, 1.06 mmol) was co-evaporated with anhydrous pyridine (2 x 20 

mL) and dissolved in anhydrous piridine (10 mL). DMT-C1 (600 mg, 1.77 mmol) was added, 
and the solution was stirred overnight at rt. The mixture was diluted with EtOAc (100 mL), 
washed with saturated NaHC0 3 (100 mL) and brine (50 mL). Organic layer was dried over 
Na2S0 4 and concentrated under reduced pressure. Purification by silica gel HPLC (20-100% 

35 EtOAc/hexane v/v, containing 0.1 % of pyridine) gave compound 21 (670 mg, 79%) as a white 
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solid material. l H NJMR (CD 3 OD): 5 8.41 (s, 1H), 8.15-8.03 (m, 4H) f 7.71-7.22 (m, 15H), 6.92- 
6.86 (m, 4H), 6.23 (s, 1H), 4.77 (s, 1H), 4.62 (s, 1H), 4.03 (d, J = 7.9 Hz, 1H), 3.99 (d, J = 7.9 
Hz, 1H), 3.79 (s, 6H), 3.67 (d, 7= 10.9 Hz, 1H), 3.54 (d, / = 10.8 Hz, 1H),. MALDI MS m/z: 
826 (M+Na) + . Anal. Calcd for C^oNFeOgHaO: C, 67.14; H, 5.14; N, 10.21. Found: C, 67.24; 
5 H,4.97;N, 10.11. 
(7/?.3/?.4fl.7SV7-f2-Cyanoethoxv(dite^ 
benzoylamino)purin-9-vlVl-(4.4'-dimethoxytri^ 

(2D . To a stirred solution of compound 20 (640 mg, 0.8 mmol) in anhydrous DMF (5 mL) 
were added DIPEA (420 L, 2.4 mmol) and 2-cyanoethyl-W,Af- 

10 diisopropylphosphoramidochloridite (300 |iL, 1.2 mmol). The mixture was stirred for 1.5 hours 
at room temperature, diluted with EtOAc (100 mL), and washed with saturated NaHCCh (2 x 
100 mL) and brine (50 mL). Organic layer was dried (Na 2 S04) and concentrated under reduced 
pressure to give a yellow solid residue. Purification by silica gel HPLC (20-100 % 
EtOAc/hexene containing 0. 1 % of pyridine) gave compound 21 (590 mg, 74%) as a white 

15 solid material. 3l P NMR (DMSO-de) 8 149.19, 148.98. 
Synthesis of Pac-protected LNA-D amidite 


The following scheme illustrates a method for synthesizing a Pac-protected version of 


LNA-D amidite. 


25 


20 



30 


NHPac 



R 


20: R = H 

21 : R = P(0(CH 2 ) 2 CN)N(iPr) 2 
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Compound 17 

Compound 7 (lg, 3.39 mmol) was co-evaporated with anhydrous DMF (2 x 10 mL) and 
dissolved in DMF (10 mL). Imidazole (0.69 g, 10.17 mmol) and l,3-dichJoro-l,l,3,3- 
5 tetraisopropyldisiloxane (1.4 mL, 4.37 mmol) were added, and the mixture was stirred 
overnight. H 2 0 (100 mL) was added under intensive stirring to precipitate nucleoside material. 
The precipitate was filtered off, washed with H2O, and dried in vacuo. Crystallization from 
ethanol gave compound 17 (1.15 g, 63%) as a white solid material. MALDI-MS: m/z 537.3 
(M+H) + . 

10 

Compound 18 

To a solution of compound 17 (1.15 g, 2.14 mmol) in anhydrous pyridine (5 mL) was 

added phenoxyacetic anhydride (2 g, 7.0 mmol) and the mixture was stirred for four hours. 

EtOAc (100 mL) was added, and the solution was washed with sat. NaHCC>3 (2 x 100 mL), 
15 brine (50 mL), dried (Na 2 SC>4), and concentrated to a solid residue. Purification by silica gel 

HPLC (50-100% v/v EtOAc/hexane) gave compound. 18 (1.65 g, 95%) as a white solid 

material. MALDI-MS: m/z 827.3 (M+Na) + . 

(/S.3/?.4/?.ZSy3-(2.6-Di-(Alphenoxv 

dioxabicvclor2.2.11heptane (19) 
20 To a solution of compound 18 (0.96 g, 1.19 mmol) in anhydrous THF (10 mL) was 

added Et 3 N-3HF (0.2 mL) and the mixture was stirred overnight at room temperature. The 

formed precipitate was collected by filtration and washed with THF (5 mL) and pentane (5 mL) 

to give after drying compound 19 (650 mg, 97%) as a white solid material. MALDI-MS: m/z 

563.0 (M+H) + . 

25 (IR.3R.4R. 7yi-3-(2.6-Di-fN-phenoxvacetvlamino)-purin-9-vn> 1 -(4,4'> 
dimethoxytritvloxvmethvn-7-hydroxv-2.5-dioxabicyclor2.2.11heptane f20) 

To a solution of compound 19 (650 mg, 1.15 mmol) was added DMT-C1 (500 mg, 1.48 
mmol). The mixture was stirred for five hours, diluted with EtOAc (100 mL), and washed with 
sat. NaHCC>3 (2 x 100 mL). The organic layer was dried and concentrated to a solid residue. 

30 Crystallization from EtOAc gave compound 20 (810 mg, 81%) as a white solid material. 
(//?.i/? t 4/?.7«y>-7-f2-Cyanoethoxyfdiisopropvlamino)phosphinoxv)''3-(2,6-di-(A^- 
phenoxvacetvlamino)-purin-9-yl)-l-(4.4 < -dimethoxvtritvloxvmethvn-2 % 5- 
dioxabicvclor2.2.nheptane f21) 

To a solution of compound 20 (800 mg, 0.92 mmol) in anhydrous DMF (10 mL) were 

35 added 0.75 M solution of DCI in EtOAc (0.7 mL) and 2-cyanoethyl 
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tetraisopropylphosphorodiamidite (0.32 mL, 1.0 1 mmol). The mixture was stirred at room 
temperature overnight and EtOAc (75 mL) was added. The resulting solution was washed with 
sat. NaHC03 and brine, dried and concentrated to a solid residue. Purification by silica gel 
HPLC (30-100% v/v EtOAc/hexane, containing 0.1% of pyridine) gave phosphoramidite 21 
5 (550 mg, 56%) as a white solid material. 
3I P NMR (DMSO-4,): 6 149.08, 148.8. 
Synthesis of LNA-2AP 

The intermediate derivative 16 was also used for the synthesis of LNA-2AP nucleoside. 
First, the 5'-0-benzoyl group of 16 was hydrolyzed by aqueous sodium hydroxide to give the 

10 nucleoside derivative 22 in 72% yield. The conditions of catalytic transfer hydrogenation 
usually used for removal of the 3'-O benzyl group turned out to be suitable for complete 
dechlorination of the nucleobase of 22. Thus, totally deprotected LNA-2AP nucleoside 23 was 
afforded in high yield after refluxing of the methanolic solution of 22 in the presence of 
paladium hydroxide and ammonium formate. The 2-amine of 23 was selectively protected with 

15 an amidine group after treatment with Af,N-dimethylformamide dimethyl acetal. The resulting 
diol 24 was then 5'~0-DMT protected and 3-0-phosphityIated to yield the desired 
phosphoramidite LNA-2AP monomer 25 (McBride etal, J. Am. Chem. Soc. 108:2040, 1986). 

Synthesis of LNA-2AP* 

20 

CI 


25 


30 



,P-0(CH 2 ) 2 CN 
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°Keys: (a) NaOH, 1,4-dioxane, H 2 0; 72%; (b) 20% Pd(OH) 2 /C, HC0 2 NH 4 , MeOH, 
H 2 0; 89%; (c) MAf-dimethylformamide dimethyl acetal, DMF; (d) DMT-CI, 
pyridine; 87% (two steps); (e) 2-cyanoethyl-MAf- 
diisopropylphosphoramidochloridite, DIPEA, DMF; 64%. 

5 

Exemplary Experimental Conditions 
riS.3/?.4ft.7SV3-(2-amino-6-chloK)purin-9^ 
dioxabicvclor2.2. llheptane (22) 

To a solution of compound 16 (3 g, 5.92 mmol) in 1,4-dioxane (20 mL) was added 2 M 

10 NaOH (20 mL) and the mixture was stirred for one hour. AcOH (3 mL) was added, and the 
solvents were removed under reduced pressure. The solid residue was re-dis$olved in 20% 
MeOH/EtAc (50 mL), washed with NaHC0 3 (2 x 50 mL), dried (Na 2 S0 4 ) and concentrated to a 
solid residue. The residue was purified by silica gel column chromatography (1-2% 
MeOH/EtAc v/v) to give compound 22 (1.72 g, 72%) as a white solid material. 

15 r/5.3/g.^/?.75^-3-f2-aminopurin-9-vn-7-hydroxy-l-hydroxvmethyl-2.5- 
dioxabicvclof 2.2. llheptane (23) 

To a solution of compound 22 (0.72 g, 1.79 mmol) in MeOH/dioxane (1/1 v/v) were 
added Pd(OH) 2 /C (20%, 0.5 g) and HC0 2 NH 4 (1.5 g, 23.8 mmol). The mixture was stirred 
under refluxing for 30 minutes and cooled to room temperature. The catalyst was filtered off 

20 and washed with MeOH. The combined filtrates were concentrated under reduced pressure to 
yield compound 23 (0.44 g, 89 %) as a white solid material. Analytical sample was crystallized 
from MeOH. mp. 227.5-229 °C (dec). *H NMR (DMSO-d6): 5 8.60 (s, 1H), 8.15 (s, 1H), 6.64 
(br s, 2H), 5.82 (s, 1H), 5.71 (br s, 1H), 5.04 (br s, 1H), 4.40 (s, 1H), 4.21 (s, 1H), 3.92 (d, J = 
7.7 Hz, 1H), 3.79 (m, 2H), 3.75 (d, J = 7.7 Hz, 1H). ,3 C NMR (DMSO-</ 6 ): 5 160.6, 152.0, 

25 149.4, 139.3, 127.1, 88.6, 84.8, 79.1, 71.6,70.2, 56.8. 
r//?.i/?.4/?.75Vl-(4.4 , -dimethoxvtritvloxvmethvn-3-f2^- 

(dimethylaminomethylidene)aminopurin~9-vlV7-hvdroxv-2,5-dioxabicvclor2.2. llheptane (5* 
DMT protected version of 24) 

Compound 23 (0.4 g, 1.43 mmol) was co-evaporated with anhydrous DMF (10 mL) and 

30 dissolved in DMF (15 mL). MAf-Dimethylformamide dirnethylacetal (0.8 mL) was added and 
the solution was stirred for three days at room temperature. Water (5 mL) was added, and the 
solvents were removed under reduced pressure. The solid residue was co-evaporated with 
anhydrous pyridine (2x10 mL) and dissolved in anhydrous pyridine (5 mL). DMT-CI (0.7 g, 
2.1 mmol) was added, the solution was stirred for four hours, diluted with EtOAc (50 mL), and 

35 washed with NaHC0 3 (2 x 50 mL) and brine (50 mL). Organic layer was dried (Na 2 S0 4 ) and 
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concentrated to a yellow solid residue. Purification by silica gel HPLC (1-6% MeOH/CH 2 Cl 2 
v/v, containing 0.1% of pyridine) gave the 5* DMT protected version of compound 24 (0.87 g, 
87%) as a white solid material. 

(/R.3j?.4R.7S>'7~(2-CvanoethoxvCdiisopropvlamino^phosphinoxvVl-(4.4 f - 
5 dimethoxvtritvloxvmethvD-3-(2-^ 
dioxabicyclor2 > 2.nheptane (25) 

The 5' DMT protected version of compound 24 (0.5 g, 0.79 mmol) was dissolved in 
anhydrous DMF (10 mL) and DIPEA (350 \iL) and 2-cyanoethyl-/V,/V- 
diisopropylphosphoramidochloridite (250 \iL) were added. The mixture was stirred for one 
10 hour, diluted with EtOAc (50 mL), washed with saturated NaHC0 3 (2 x 100 mL) and brine (50 
mL), dried (Na2S04), and concentrated to a solid residue. Purification by silica gel HPLC (0- 
3% MeOH/CH 2 Cl 2 v/v, containing 0.1% of pyridine) gave compound 25 (0.42 g, 64%) as a 
white solid material. 3, P NMR (DMSO-rf 6 ) 8 148.93, 148.85. 

15 Synthesis of Oligomers 

Along with previously described LNA phosphoramidites (Koshkin et ai, supra\ and 
Pedersen et ai 3 Synthesis p. 802, 2002), the phosphoramidite monomers 11, 21, and 25 were 
successfully applied for automated oligonucleotide synthesis (Caruthers, Acc. Chem. Res. 
24:278, 1991) to produce the LNA oligomers depicted in Table 9. Oligonucleotide syntheses 

20 were performed on a 0.2 jimol scale using an Expedite synthesizer (Applied Biosystems) with 
the recommended commercial reagents. Standard protocols for DNA synthesis were used, 
except that the coupling time was extended to 5 minutes and the oxidation time was extended to 
30 second cycles. Deprotection of the oligonucleotides were performed by treatment with 
concentrated ammonium hydroxide for five hours at 60 °C. After that, the LNA-D containing 

25 oligonucleotides were additionally treated with AMA (concentrated ammonium hydroxide / 
40% aqueous MeNH2; 1/1 v/v) for one hour at 60°C. All the synthesized oligonucleotides were 
purified by RP-HPLC, and their structures were verified by MALDI-TOF mass spectra. 

The complexing properties of oligonucleotides containing new LNA monomers 1-3 
were assessed. Comparative binding data from an 8-mer LNA sequence is shown in Table 9 as 

30 the melting temperatures against complementary single stranded DNA. An exemplary 
sequence for this comparison is GACATAGG, which is the central part of a capture probe used 
for SNP detection in GluclVS7-7asA (A:a mismatch position). The thermal stabilities of 
reference DNA duplexes (entries 1-7, Table 9) can be directly compared with their LNA 
counterparts (entries 8-14). The hybridizing ability of all LNA 8-mers is superior to that of 

35 isosequencial DNA oligonucleotides. The average melting temperatures of DNA and LNA 8- 
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mers against complementary DNAs typically differ by about 40 °C. The replacement of one 
internal LNA-A nucleotide by LNA-D resulted in the further stabilization of the complementary 
duplex (i.e., compare entries 8 and 11) by 6.2 °C. Interestingly, the analogous replacement 
made in an DNA octamer destabilized the corresponding duplex by 0.5 °C (i.e., entries 1 and 
5 4). D-nucleosides may facilitate a B to A helix transition, because the A-type structure of an 
LNA:DNA duplex is more suitable for effective D:t pairing. This stabilizing effect is expected 
to be even more pronounced for LNA:RNA duplexes, which can be very useful for construction 
of antisense or other gene-silencing reagents. The mismatch discrimination ability of the D- 
nucleoside was also studied (entry 11). In comparison to LNA-A (entry 8) D-nucIeoside 
10 demonstrated remarkable increased mismatch discrimination against DNA-g nucleoside. 

Table 9. Melting temperatures (Tm) of the complementary DNA-DNA and LNA-DNA 
duplexes. 0 Modified monomers (LNA are in CAPITALS): I = inosine; D = 2,6-diaminopurine; 
X = 2-aminopurine. 

15 


Entry 

Oligonucleotide 
structure 

Tm (± 0.5 °C) of the duplexes with complementary 
deoxynucleotide 

3'-ctgtatcc 

3'-ctgaatcc 

3'-ctggatcc 

3'-ctgcatcc 

1 

5'-gacatagg 

23.8 

<10 

<10 

<10 

2 

5'-gacttagg 

<10 

22.6 

<10 

<10 

3 

5'-gacgtagg 

<10 

<10 

<10 

25.0 

4 

5'-gacdtagg 

233 

<10 

<10 

<10 

5 

S'-gdcdtdgg 

33.4 

<10 

<10 

17.7 

6 

S'-gacitagg 

<10 

<10 

<10 

20.9 

7 

5'-gacxtagg 

<10 

<to 

<10 

<10 

8 

5'-GACATAGG 

61.6 

38.2 

43.4 

40.6 

9 

5-GACTTAGG 

28.0 

60.7 

36.4 

23.5 

10 

5-GACGTAGG 

55.0 

32»>- 


70.9 

11 

5-GACDTAGG 

67.8 

42.2 

41.4 

52.4 

12 

S'-GDCDTDGG 

78.3 

55.9 

54.7 

63.8 

13 

5-GACITAGG 

53.1 

48.2 

43.0 

59.9 

14 

5-GACXTAGG 

60.8 

45.5 

44.0 

53.9 
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11 The melting temperatures (Tm values) were obtained as a maxima of the first derivative of the 
corresponding melting curves (optical density at 260 nm versus temperature). Concentration of 
the duplexes: 2.5 piM. Buffer: 0.1 M NaCl; 10 mM Na-phosphate (pH 7.0); 1 mM EDTA. 
b Low cooperativity of transitions (accuracy ± 1 °C). 

5 

Table 10. The mismatch discrimination effect of the chimeric LNA-DNA 12-mers containing 
LNA-A or LNA-D nucleosides against the point of mutation 


The structure of 

LNA-DNA 

oligonucleotide 

Tm (± 0.5 °C) of the complementary duplexes with DNA 
oligonucleotides (A Tm between singly mismatched and perfect 
duplexes) 

HNFasl28A-2 


caacatcccaca 

caacaacccaca 

tGtggGATGttg 

61.0 

45.9 (-15.1) 

tGtggGDTGttg 

65.5 

49.7 (-15.8) 

Giuc53as-A ] 


aagagtccagtg 

aagaggccagtg 

cAmCtgGAmCtctt 

61.5 

50.6 (-10.9) 

cAmCtgGDmCtctt 

65.3 

45.4 (-19.9) 


10 a Concentration of duplexes: 2 |iM; Buffer: see Table 9. 


Table 11. Melting temperatures of the LNA and DNA duplexes (LNAs are CAPITALIZED) 
containing 2-thio-deoxythymidine (s) and diaminopurineriboside (d). See Table 9 for 
experimental conditions. 


oligo structure 

T m (± 0.5 °C ) of complementary duplexes with 


3'-ctgtatcc 

3'-ctgsatcc 

3'-CTGsATCC 

3-CTGtATCC 

3-CTGTATCC 

5-gacatagg 

23.8 

27 

54.4 

49.4 

54.6 1 

5'-gacdtagg 

23.3 

<6 

45.4 

55.2 

60S 

5-GACATAGG* 

61.6 

64.6 




5-GACDTAGG 

67.8 

59.4 




15 
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* T m values in the shaded cells were measured in low salt buffers (1 mM Na-phosphate, pH 
7.0). Low cooperativity of the transitions was observed (accuracy ±1.5°C) 

Example 22: Exemplary Methods for Synthesizing LN A-PvrroloPvr-SBC-C 
5 The furanopyrimidine phosphoramidite 6pC used for incorporation of the pyrroloC 

analogue can be synthesized from LNA-U through a series of reactions as illustrated below and 
in Fig. 14. Starting from LNA-U IpC iodine can be introduced on the 5 position on the 
nucleobase (Chang and Welch, J. Med. Chem. 1963, 6, 428). This compound can be used in a 
Sonogashira type palladium coupling reaction (Sonogashira, Tohda and Hagihara, Tetrahedron 
10 Lett. 1975, 4467) resulting in the 5-ethynyl-LNA-U 3pC. The 5-ethynyl-LNA-U 3pC can be 
transformed to the furanopyrimidie LNA analogue 4pC when reacted with Cul, and then 
transformed into the DMT-protected phosphoramidite 6pC (Woo, Meyer, and Gamper, Nucleic 
Acids Res., 1996, 24, 2470). LNA-PyrroloPyr-SBC-C is formed when 6pC or an 
oligonucleotide containing 6pC is deprotected with ammonia. 





IpC 


2pC 3pC 



DMTO 1 N DMTO 



4pC 


5pC 



6pC 


15 

Example 23: Exemplary Modified Bases such as Universal Bases 

Desirable modified bases are covalently linked to the l'-position of a furanosyl ring, 
particularly to the l'-position of a 2\4 -linked furanosyl ring, especially to the l'-position of a 2 - 
O^'-C-methylene-beta-D-ribofuranosyl ring. 
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As discussed above, other desirable modified bases contain one or more carbon alicyclic 
or carbocyclic aryl units, i.e. non-aromatic or aromatic cyclic units that contain only carbon 
atoms as ring members. Modified bases that contain carbocyclic aryl groups are generally 
desirable, particularly a moiety that contains multiple linked aromatic groups, particularly 
5 groups that contain fused rings. That is, optionally substituted polynuclear aromatic groups are 
especially desirable such as optionally substituted naphthyl, optionally substituted anthracenyl, 
optionally substituted phenanthrenyl, optionally substituted pyrenyl, optionally substituted 
chrysenyl, optionally substituted benzanthracenyl, optionally substituted dibenzanthracenyl, 
optionally substituted benzopyrenyl, with substituted or unsubstituted pyrenyl being particularly 
10 desirable. 

Without being bound by any theory, it is believed that such carbon alicyclic and/or 
carbocyclic aryl modified bases can increase hydrophobic interaction with neighboring bases of 
an oligonucleotide. Those interactions can enhance the stability of a hybridized oligo pair, 
without necessity of interactions between bases of the distinct oligos of the hybridized pair. 

15 Again without being bound by any theory, it is further believed that such hydrophobic 

interactions can be particularly favored by platelike stacking of neighboring bases, i.e. 
intercalation. Such intercalation will be promoted if the base comprises a moiety with a 
relatively planar extended structure, such as provided by an aromatic group, particularly a 
carbocyclic aryl group having multiple fused rings. This is indicated by the increases in T m 

20 values exhibited by oligos having LNA units with pyrenyl nucleobases relative to comparable 
oligos having LNA units with naphthyl nucleobases. 

Modified bases that contain one or more heteroalicyclic or heteroaromatic groups also 
are suitable for use in LNA units, particularly such non-aromatic and aromatic groups that 
contains one or more N, O or S atoms as ring members, particularly at least one sulfur atom, 

25 and from 5 to about 8 ring members. Also desirable is a nucleo base that contains two or more 
fused rings, where at least one of the rings is a heteroalicyclic or heteroaromatic group 
containing 1, 2, or 3 N, O, or S atoms as ring members. 

In general, desirable are modified bases that contain 2, 3, 4, 5, 6, 7 or 8 fused rings, 
which may be carbon alicyclic, heteroalicyclic, carbocyclic aryl and/or heteroaromatic; more 

30 desirably modified bases that contain 3, 4, 5, or 6 fused rings, which may be carbon alicyclic, 
heteroalicyclic, carbocyclic aryl and/or heteroaromatic, and desirably the fused rings are each 
aromatic, particularly carbocyclic aryl. 

In some embodiments, the base is not an optionally substituted oxazole, optionally 
substituted imidazole, or optionally substituted isoxazole modified base. 
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Other suitable modified bases for use in LNA units in accordance with the invention 
include optionally substituted pyridyloxazole, optionally substituted pyrenylmethylglycerol, 
optionally substituted pyrrole, optionally substituted diazole and optionally substituted triazole 
groups. 

5 Desirable modified bases of the present invention when incorporated into an 

oligonucleotide containing all LNA units or a mixture of LNA and DNA or RNA units will 

exhibit substantially constant T m values upon hybridization with a complementary 

oligonucleotide, irrespective of the bases present on the complementary oligonucleotide. 

In some embodiments, one or more of the common RNA or commonly used derivatives 
10 thereof, such as 2-O-methyl, 2'-fluoro, 2-ailyl, and 2 , -0-methoxyethoxy derivatives are 

combined with at least one nucleotide with a universal base to generate an oligonucleotide 

having between five to 100 nucleotides. 

Modified nucleic acid compounds may comprise a variety of nucleic acid units e.g. 

nucleoside and/or nucleotide units. As discussed above, an LNA nucleic acid unit has a carbon 
15 or hetero alicyclic ring with four to six ring members, e.g. a furanose ring, or other alicyclic 

ring structures such as a cyclopentyl, cycloheptyl, tetrahydropyranyl, oxepanyl, 

tetrahydrothiophenyl, pyrrolidinyl, thianyl, thiepanyl, piperidinyl, and the like. 

In an aspect of the invention, at least one ring atom of the carbon or hetero alicyclic 

group is taken to form a further cyclic linkage to thereby provide a multi-cyclic group. The 
20 cyclic linkage may include one or more, typically two atoms, of the carbon or hetero alicyclic 

group. The cyclic linkage also may include one or more atoms that are substituents, but not 

ring members, of the carbon or hetero alicyclic group. 

Unless indicated otherwise, an alicyclic group as referred to herein is inclusive of group 

having all carbon ring members as well as groups having one or more hetero atom (e.g. N, O, S 
25 or Se) ring members. The disclosure of the group as a "carbon or hetero alicyclic group" 

further indicates that the alicyclic group may contain all carbon ring members (i.e. a carbon 

alicyclic) or may contain one or more hetero atom ring members (i.e. a hetero alicyclic). 

Alicyclic groups are understood not to be aromatic, and typically are fully saturated within the 

ring (i.e. no endocyclic multiple bonds). 
30 Desirably, the alicyclic ring is a hetero alicyclic, i.e. the alicyclic group has one or more 

hetero atoms ring members, typically one or two hetero atom ring members such as O, N, S or 

Se, with oxygen being often desirable. 

The one or more cyclic linkages of an alicyclic group may be comprised completely of 

carbon atoms, or generally more desirable, one or more hetero atoms such as O, S, N or Se, 
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desirably oxygen for at least some embodiments. The cyclic linkage will typically contain one 

or two or three hetero atoms, more typically one or two hetero atoms in a single cyclic linkage. 
The one or more cyclic linkages of a nucleic acid compound of the invention can have a 

number of alternative configurations and/or configurations. For instance, cyclic linkages of 
5 nucleic acid compounds of the invention will include at least one alicyclic ring atom. The 

cyclic linkage may be disubstituted to a single alicyclic atom, or two adjacent or non-adjacent 

alicyclic ring atoms may be included in a cyclic linkage. Still further, a cyclic linkage may 

include a single alicyclic ring atom, and a further atom that is a substituent but not a ring 

member of the alicyclic group. 
10 For instance, as discussed above, if the alicyclic group is a furanosyl-type ring, desirable 

cyclic linkages include the following: C-l\ C-2 r ; C-2', C-3*; C-2\ C-4'; or a C-2*, C-5' linkage. 

A cyclic linkage will typically comprise, in addition to the one or more alicyclic group 

ring atoms, 2 to 6 atoms in addition to the alicyclic ring members, more typically 3 or 4 atoms 

in addition to the alicyclic ring member(s). 
15 The alicyclic group atoms that are incorporated into a cyclic linkage are typically carbon 

atoms, but hetero atoms such as nitrogen of the alicyclic group also may be incorporated into a 

cyclic linkage. 

Specifically desirable modified nucleic acids for use oligonucleotides of the invention 
include locked nucleic acids as disclosed in W099/14226 (which include bicyclic and tricyclic 

20 DNA or RNA having a 2-4' or 2'-3' sugar linkages); 2-deoxy-2*-fluoro ribonucleotides; 2-0- 
methyl ribonucleotides; 2-0-methoxyethyl ribonucleotides; peptide nucleic acids; 5-propynyl 
pyrimidine ribonucleotides; 7-deazapurine ribonucleotides; 2,6-diaminopurine ribonucleotides; 
and 2-thio-pyrimidine ribonucleotides. 

LNA units as disclosed in WO 99/14226 are in general particularly desirable modified 

25 nucleic acids for incorporation into an oligonucleotide of the invention. Additionally, the 
nucleic acids may be modified at either the 3' and/or 5* end by any type of modification known 
in the art. For example, either or both ends may be capped with a protecting group, attached to 
a flexible linking group, attached to a reactive group to aid in attachment to the substrate 
surface, etc. Desirable LNA units also are disclosed in WO 0056746. WO 0056748, and WO 

30 0066604. 

Desirable syntheses of pyrene-LNA monomers is shown in the following Schemes 1 and 
2. In the below Schemes 1 and 2, the compound reference numerals are also referred to in the 
examples below. 

35 
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A wide variety of modified nucleic acids may be employed, including those that have 2'- 
modification of hydroxyl, 2'-0-methyl, 2'-fluoro, 2'-trifluoromethyl, 2'-0-(2-methoxyethyl), 
2'-0-aminopropyl, 2MD-dimethylamino-oxyethyl, 2'-0-fluoroethyl or 2'-0-propenyl. The 
nucleic acid may further include a 3* modification, desirably where the 2 - and 3-position of the 
5 ribose group is linked. The nucleic acid also may contain a modification at the 4'-position, 
desirably where the 2 - and 4'-positions of the ribose group are linked such as by a 2'-4 f link of - 
CH 2 S-, -CHi-NH", or -CH 2 -NMe- bridge. 

The nucleotide also may have a variety of configurations such as a-D-ribo, P-D-xylo, or 
a-L-xylo configuration. 

10 The internucleoside linkages of the units of oligos of the invention may be natural 

phosphorodiester linkages, or other linkages such as -0-P(0) 2 -0-, -0-P(0,S)-0-, -0-P(S) 2 -0-, - 
NR H -P(0) 2 -0-, -0-P(0 > NR H )-0-, -0-PO(R")-0-, -O-P0(CH 3 )-O-, and -0-PO(NHR N )-0-, 
where R H is selected from hydrogen and Ci. 4 -alkyl, and R" is selected from Ci-6-alkyl and 
phenyl. 

15 A further desirable group of modified nucleic acids for incorporation into oligomers of 

the invention include those of the following formula: 



20 wherein X is -O-; B is a modified base as discussed above e.g. an optionally substituted 
carbocyclic aryl such as optionally substituted pyrene or optionally substituted 
pyrenylmethylglycerol, or an optionally substituted heteroalicylic or optionally substituted 
heteroaromatic such as optionally substituted pyridyloxazole. Other desirable universal bases 
include, pyrrole, diazole or triazole moieties, all of which may be optionally substituted. R 1 * is 

25 hydrogen. 

P designates the radical position for an internucleoside linkage to a succeeding 
monomer, or a 5-terminal group, such internucleoside linkage or S'-terminal group optionally 
including the substituent R 5 , R 5 being hydrogen or included in an internucleoside linkage. R 3 * 
is a group P* which designates an internucleoside linkage to a preceding monomer, or a 3- 
30 terminal group. One or two pairs of non-geminal substituents selected from the present 
substituents of R 2 , R 2 \ R 3 , R 4 \ may designate a biradical consisting of 1-4 groups/atoms 
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selected from -C(R a R b )- f -C(R a )=C(R a )-, -C(R a )=N-, -O-, -S-, -S0 2 -, -N(R a )-, and >C=Z. 2 is 
selected from -O-, -S-, and -N(R*)-, and R a and R b each is independently selected from hydro- 
gen, optionally substituted Q-e-alkyl, optionally substituted C2-6-alkenyl, hydroxy, Ci^-alkoxy, 
C2-6-alkenyloxy, carboxy, C|.6-alkoxycarbonyl, Ci_6-alkylcarbonyl, formyl, amino, mono- and 
5 di(C|-6-alkyl)amino, carbamoyl, mono- and di(C|_6-alkyl)-amino-carbonyl, amino-O-6-alkyl- 
aminocarbonyl, mono- and di(Cw6*alkyl)amino-Ci-6-alkyl-aminocarbonyl, Cj-6-alkyl- 
carbonylamino, carbamido, Ci^-alkanoyloxy, sulphono, C^-alkylsulphonyloxy, nitro, azido, 
sulphanyl, Ci_6-alkylthio, halogen, photochemical ly active groups, thermochemically active 
groups, chelating groups, reporter groups, and ligands, the possible pair of non-geminal 

10 substituents thereby forming a monocyclic entity together with (i) the atoms to which the non- 
geminal substituents are bound and (ii) any intervening atoms; and each of the substituents R 2 , 
R 2 \ R 3 , R 4 * which are present and not involved in the possible biradical is independently 
selected from hydrogen, optionally substituted Ci-e-alkyl, optionally substituted C2-*-alkenyl, 
hydroxy, Ci^-alkoxy, C2-6-alkenyloxy, carboxy, Ci^~alkoxycarbonyl, C|.6-alkylcarbonyl, 

15 formyl, amino, mono- and di(Ci-6-alkyI)amino, carbamoyl, mono- and di(Ci.6-alkyl)-amino- 
carbonyl, amino-Ci_6-alkyl-aminocarbonyl, mono- and di(Ci_6-alkyl)amino-Ci-6-alkyl- 
aminocarbonyl, Ci-6-alkyl-carbonylamino, carbamido, Ci^-alkanoy!oxy, sulphono, C|_6- 
alkylsulphonyloxy, nitro, azido, sulphanyl, Ci^-alkylthio, halogen, photochemically active 
groups, thermochemically active groups, chelating groups, reporter groups, and ligands; and 

20 basic salts and acid addition salts thereof. 

Modified nucleobases and nucleosidic bases may comprise a cyclic unit (e.g. a 
carbocyclic unit such as pyrenyl) that is joined to a nucleic unit, such as a 1 '-position of 
furasonyl ring through a linker, such as a straight of branched chain alkylene or alkenylene 
group. Alkylene groups suitably having from 1 (i.e. -CH 2 -) to about 12 carbon atoms, more 

25 typically I to about 8 carbon atoms, still more typically 1 to about 6 carbon atoms. Alkenylene 
groups suitably have one, two or three carbon-carbon double bounds and from 2 to 12 carbon 
atoms, more typically 2 to 8 carbon atoms, still more typically 2 to 6 carbon atoms. 

Example 24: Exemplary Nucleic Acid Monomers and Oligomers 

30 Desirable LNA units include those that contain a furanosyl-type ring and one or more of 

the following linkages: C-l\ C-2'; C-2\ C-3'; C~2\ C-4'; or a C-2', C-5' linkage. A C-2', CM' is 
particularly desirable. In another aspect of the invention, desirable LNA units are compounds 
having a substituent on the 2 '-position of the central sugar moiety (e.g., ribose or xylose), or 
derivatives thereof, which favors the C3*-endo conformation, commonly referred to as the 

35 North (or simply N for short) conformation. Desirable LNA In various embodiments, the 
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oligonucleotide has at least one LNA unit with a modified base as disclosed herein. Suitable 
oligonucleotides also may contain natural DNA or RNA units (e.g., nucleotides) with natural 
bases, as well as LNA units that contain natural bases. Furthermore, the oligonucleotides of the 
invention also may contain modified DNA or RNA, such as 2'-0-methyl RNA, with natural or 
5 modified nucleobases (e.g., pyrene). Desirable oligonucleotides contain at least one of and 
desirably both of 1) one or more DNA or RNA units (e.g., nucleotides) with natural bases, and 
2) one or more LNA units with natural bases, in addition to LNA units with a modified base. In 
other embodiments, the nucleic acid does not contain a modified base. 

Oligonucleotides of the invention desirably contain at least 50 percent or more, more 

10 desirably 55, 60, 65, or 70 percent or more of non-modified or natural DNA or RNA units (e.g., 
nucleotides) or units other than LNA units based on the total number of units or residues of the 
oligo. A non-modified nucleic acid as referred to herein means that the nucleic acid upon 
incorporation into a 10-mer oligomer will not increase the T m of the oligomer in excess of 1°C 
or 2°C. More desirably, the non-modified nucleic acid unit (e.g., nucleotide) is a substantially 

15 or completely "natural" nucleic acid, i.e. containing a non-modified base of uracil, cytosine, 5- 
methyl-cytosine, thymine, adenine or guanine and a non-modified pentose sugar unit of (3-D- 
ribose (in the case of RNA) or p-D-2-deoxyribose (in the case of DNA). 

Oligonucleotides of the invention suitably may contain only a single modified (i.e. 
LNA) nucleic acid unit, but desirably an oligonucleotide will contain 2, 3, 4 or 5 or more 

20 modified nucleic acid units. Typically desirable is where an oligonucleotide contains from 
about 5 to about 40 or 45 percent modified (LNA) nucleic acid units, based on total units of the 
oligo, more desirably where the oligonucleotide contains from about 5 or 10 percent to about 
20, 25, 30 or 35 percent modified nucleic acid units, based on total units of the oligo. 

Typical oligonucleotides that contain one or more LNA units with a modified base as 

25 disclosed herein suitably contain from 3 or 4 to about 200 nucleic acid repeat units, with at least 
one unit being an LNA unit with a modified base, more typically from about 3 or 4 to about 5, 
6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140 or 150 nucleic acid 
units, with 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 LNA units with a modified base being present. 

As discussed above, particularly desirable oligonucleotides contain a non- modified 

30 DNA or RNA unit at the 3' terminus and a modified DNA or RNA unit at one position 
upstream from (generally referred to hereing as the -1 or penultimate position) the 3' terminal 
non-modified nucleic acid unit. In some embodiments, the modified base is at the 3* terminal 
position of a nucleic acid primer, such as a primer for the detection of a single nucleotide 
polymorphism. Other particularly desirable nucleic acids have an LNA unit with or without a 

35 modified base in the 5' and/or 3' terminal position. 
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Also desirable are oligonucleotides that do not have an extended stretches of modified 
DNA or RNA units, e.g. greater than about 4, 5 or 6 consecutive modified DNA 
or RNA units. That is, desirably one or more non-modified DNA or RNA will be present after 
a consecutive stretch of about 3, 4 or 5 modified nucleic acids. 
5 Generally desirable are oligonucleotides that contain a mixture of LNA units that have 

non-modified or natural nucleobases (i.e., adenine, guanine, cytosine, 5-methyl-cytosine, 
uracil, or thymine) and LNA units that have modified bases as disclosed herein. 

Particularly desirable oligonucleotides of the invention include those where an LNA 
unit with a modified base is interposed between two LNA units each having non-modified or 
10 natural bases (adenine, guanine, cytosine, 5-methyl-cytosine, uracil, or thymine. The LNA 
"flanking" units with natural base moieties may be directly adjacent to the LNA with modified 
base moiety, or desirably is within 2, 3, 4 or 5 nucleic acid units of the LNA unit with modified 
base. Nucleic acid units that may be spaced between an LNA unit with a modified base and an 
LNA unit with natural nucleobasis suitably are DNA and/or RNA and/or alkyl-modified 
15 RNA/DNA units, typically with natural base moieties, although the DNA and or RNA units 
also may contain modified base moieties. 

The oligonucleotides of the present invention are comprised of at least about one 
universal base. Oligonucleotides of the present can also be comprised, for exmple, of between 
about one to six 2'-Ome-RNA unit, at least about two LNA units and at least about one LNA 
20 pyrene unit. 

Example 25: Exemplary Target Nucleic Acids 

In the practice of the present invention, target genes may be suitably single-stranded or 
double-stranded DNA or RNA; however, single-stranded DNA or RNA targets are desirable. It 

25 is understood that the target to which the nucleic acids of the invention are directed include 
allelic forms of the targeted gene and the corresponding mRNAs including splice variants. 
There is substantial guidance in the literature for selecting particular sequences for nucleic acids 
with LNA or other high affinity nucleotides given a knowledge of the sequence of the target 
polynucleotide, e.g., Peyman and Ulmann, Chemical Reviews, 90:543-584, 1990; Crooke, Ann. 

30 Rev. Pharmacol. Toxicol., 32:329-376 (1992); and Zamecnik and Stephenson, Proc. Natl. Acad. 
ScL, 75:280-284 (1974). Desirable mRNA targets include the 5' cap site, tRNA primer binding 
site, the initiation codon site, the mRNA donor splice site, and the mRNA acceptor splice site, 
e.g., Goodchild el al. v U.S. Patent 4,806,463. 

35 Example 26: Exemplary Applications of Present Methods 
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The chimeric oligos of the present invention are highly suitable for a variety of 
diagnostic purposes such as for the isolation, purification, amplification, detection, 
identification, quantification, or capture of nucleic acids such as DNA, mRNA or non-protein 
coding cellular RNAs, such as tRNA, rRNA, snRNA and scRNA, or synthetic nucleic acids, in 
5 vivo or in vitro. 

The oligomer can comprise a photochemically active group, a thermochemically active 
group, a chelating group, a reporter group, or a ligand that facilitates the direct or indirect 
detection of the oligomer or the immobilization of the oligomer onto a solid support. Such 
group are typically attached to the oligo when it is intended as a probe for in situ hybridization, 
10 in Southern hybridization, Dot blot hybridization, reverse Dot blot hybridization, or in Northern 
hybridization. 

When the photochemically active group, the thermochemically active group, the 
chelating group, the reporter group, or the ligand includes a spacer (K), the spacer may suitably 
comprise a chemically cleavable group. 

15 An additional object of the present invention is to provide oligonucleotides which 

combines an increased ability to discriminate between complementary and mismatched targets 
with the ability to act as substrates for nucleic acid active enzymes such as for example DNA 
and RNA polymerases, ligases, phosphatases. Such oligonucleotides may be used for instance 
as primers for sequencing nucleic acids and as primers in any of the several well known 

20 amplification reactions, such as the PCR reaction. 

Introduction of LNA monomers with natural bases into either DNA, RNA, or pure LN A 
oligonucleotides can result in extremely high thermal stability of duplexes with complimentary 
DNA or RNA, while at the same time obeying the Watson-Crick base-pairing rules. In general, 
the thermal stability of heteroduplexes is increased 3-8°C per LNA monomer in the duplex. 

25 Oligonucleotides containing LNA can be designed to be substrates for polymerases (e.g., Tag 
polymerase), and PCR based on LNA primers is more discriminatory towards single base 
mutations in the template DNA compared to normal DNA-primers (e.g., allele specific PCR). 
Furthermore, very short LNA oligos (e.g. 5-mers or 8-mers) which have high TVs when 
compared to similar DNA oligos can be used as highly specific catching probes with 

30 outstanding discriminatory power towards single base mutations (e,g., SNP detection). 

LNA oligonucleotides are capable of hybridizing with double-stranded DNA target 
molecules as well as RNA secondary structures by strand invasion as well as of specifically 
blocking a wide selection of enzymatic reactions such as, digestion of double-stranded DNA by 
restriction endonucleases; and digestion of DNA and RNA with deoxyribonucleases and 

35 ribonuc leases, respectively. 
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In a further aspect, oligonucleotides of the invention may be used to construct new 
affinity pairs with exhibit enhanced specificity towards each other. The affinity constants can 
easily be adjusted over a wide range and a vast number of affinity pairs can be designed and 
synthesized. One part of the affinity pair can be attached to the molecule of interest (e.g. 
5 proteins, amplicons, enzymes, polysaccharides, antibodies, haptens, peptides, etc.) by standard 
methods, while the other part of the affinity pair can be attached to e.g. a solid support such as 
beads, membranes, micro-titer plates, sticks, tubes, etc. The solid support may be chosen from 
a wide range of polymer materials such as for instance polypropylene, polystyrene, 
polycarbonate or polyethylene. The affinity pairs may be used in selective isolation, 
10 purification, capture and detection of a diversity of the target molecules. 

Oligonucleotides of the invention also may be employed as probes in the purification, 
isolation and detection of for instance pathogenic organisms such as viral, bacteria and fungi 
etc. Oligonucleotides of the invention also may be used as generic tools for the purification, 
isolation, amplification and detection of nucleic acids from groups of related species such as for 
15 instance rRNA from gram-positive or gram negative bacteria, fungi, mammalian cells etc. 

Oligonucleotides of the invention also may be employed as an aptamer in molecular 
diagnostics, e.g. in RNA mediated catalytic processes, in specific binding of antibiotics, drugs, 
amino acids, peptides, structural proteins, protein receptors, protein enzymes, saccharides, 
polysaccharides, biological cofactors, nucleic acids, or triphosphates or in the separation of 
20 enantiomers from racemic mixtures by stereospecific binding. 

Oligonucleotides of the invention also may be used for labeling of cells, e.g. in methods 
wherein the label allows the cells to be separated from unlabelled cells. 

Oligonucleotides also may be conjugated to a compound selected from proteins, 
amplicons, enzymes, polysaccharides, antibodies, haptens, and peptides. 
25 Kits are also provided containing one or more oligonucleotides of the invention for the 

isolation, purification, amplification, detection, identification, quantification, or capture of 
natural or synthetic nucleic acids. The kit typically will contain a reaction body, e.g. a slide or 
biochip. One or more oligonucleotides of the invention may be suitably immobilized on such a 
reaction body. 

30 The invention also provides methods for using kits of the invention for carrying out a 

variety of bioassays, e.g. for diagnostic purposes. Any type of assay wherein one component is 
immobilized may be carried out using the substrate platforms of the invention. Bioassays 
utilizing an immobilized component are well known in the art. Examples of assays utilizing an 
immobilized component include for example, immunoassays, analysis of protein-protein 

35 interactions, analysis of protein-nucleic acid interactions, analysis of nucleic acid-nucleic acid 
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interactions, receptor binding assays, enzyme assays, phosphorylation assays, diagnostic assays 
for determination of disease state, genetic profiling for drug compatibility analysis, and SNP 
detection (U.S.P.N. 6,316,198; 6,303,315). 

Identification of a nucleic acid sequence capable of binding to a biomolecule of interest 
5 can be achieved by immobilizing a library of nucleic acids onto the substrate surface so that 
each unique nucleic acid was located at a defined position to form an array. The array would 
then be exposed to the biomolecule under conditions which favored binding of the biomolecule 
to the nucleic acids. Non-specifically binding biomolecules could be washed away using mild 
to stringent buffer conditions depending on the level of specificity of binding desired. The 

10 nucleic acid array would then be analyzed to determine which nucleic acid sequences bound to 
the biomolecule. Desirably the biomolecules would carry a fluorescent tag for use in detection 
of the location of the bound nucleic acids. 

Oligonucleotides of the invention can be employed in a wide range of applications, 
particularly those in those applications involving a hybridization reaction. Oligonucleotides 

15 also may be used in DNA sequencing aiming at improved throughput in large-scale, shotgun 
genome sequencing projects, improved throughput in capillary DNA sequencing (e.g. ABI 
prism 3700) as well as at an improved method for 1) sequencing large, tandemly repeated 
genomic regions, 2) closing gaps in genome sequencing projects and 3) sequencing of GC-rich 
templates. In DNA sequencing, oligonucleotide sequencing primers are combined with LNA 

20 enhancer elements for the read-through of GC-rich and/or tandemly repeated genomic regions, 
which often present many challenges for genome sequencing projects. LNA may increase the 
specificity of certain sequencing primers and thus facilitate selection of a particular version of a 
repeated sequence and possibly also use strand invasion to open up recalcitrant GC rich 
sequences. 

25 The incorporation of one or more universal nucleosides into the oligomer makes 

bonding to unknown bases possible and allows the oligonucleotide to match ambiguous or 

unknown nucleic acid sequences. 

As discussed above, oligonucleotides of the invention may be used for therapeutic 

applications, e.g. as an antisense, antigene or ribozyme or double stranded nucleic acid 
30 therapeutic agents. In these therapeutic methods, one or more oligonucleotides of the invention 

is administered as desired to a patient suffering from or susceptible the targeted disease or 

disorder, e.g. a viral infection. 

In an exemplary in vitro method for measuring the ability of a nucleic acid of the 

invention to silence a target gene, cells are cultured in standard medium supplemented with 1% 
35 fetal calf serum as previously described (Lykkesfeld etal, Int. J. Cancer 61:529-534, 1995). At 
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the start of the experiment cells are approximately 40% confluent. The serum containing 
medium is removed and replaced with serum-free medium. Transfection is performed using, 
e.g., Lipofectin (GibcoBRL cat. No 18292-011) diluted 40X in medium without serum and 
combined with the oligo to a concentration of 750 nM oligo, 0.8 ug/ml Lipofectin. Then, the 
5 medium is removed from the cells and replaced with the medium containing oligo-Lipofectin 
complex. The cells are incubated at 37°C for 6 hours, rinsed once with medium without serum 
and incubated for a further 18 hours in DME/F12 with 1% FCS at 37°C. Standard methods are 
used to measure the level of mRNA or protein encoded by the target gene to measure the level 
of gene silencing. 

10 It is also contemplated that information on the structures assumed by a target nucleic 

acid may be used in the design of the probes, such that regions that are known or suspected to 
be involved in folding may be chosen as hybridization sites. Such an approach will reduce the 
number of probes that are likely to be needed to distinguish between targets of interest. 

There are many methods used to obtain structural information involving nucleic acids, 

15 including the use of chemicals that are sensitive to the nucleic acid structure, such as 
phenanthroline/copper, EDTA-Fe 2+ , cisplatin, ethylnitrosourea, dimethyl pyrocarbonate, 
hydrazine, dimethyl sulfate, and bisulfite. Enzymatic probing using structure-specific nucleases 
from a variety of sources, such as the Cleavase™ enzymes (Third Wave Technologies, Inc., 
Madison, Wis.), Taq DNA polymerase, E. coli DNA polymerase I, and eukaryotic structure- 

20 specific endonucleases (e.g., human, murine and Xenopus XPG enzymes, yeast RAD2 
enzymes), murine FEN-1 endonucleases (Harrington and Lieber, Genes and Develop., 3:1344 
[1994]) and calf thymus 5' to 3* exonuclease (Murante et al., J. Biol. Chem., 269:1 191 [1994]). 
In addition, enzymes having 3' nuclease activity such as members of the family of DNA repair 
endonucleases (e.g., the RrpI enzyme from Drosophila melanogaster, the yeast RAD1/RAD10 

25 complex and E. coli Exo III), are also suitable for examining the structures of nucleic acids. 

If analysis of structure as a step in probe selection is to be used for a segment of nucleic 
acid for which no information is available concerning regions likely to form secondary 
structures, the sites of structure-induced modification or cleavage must be identified. It is most 
convenient if the modification or cleavage can be done under partially reactive conditions (i.e., 

30 such that in the population of molecules in a test sample, each individual will receive only one 
or a few cuts or modifications). When the sample is analyzed as a whole, each reactive site 
should be represented, and all the sites may be thus identified. Using a Cleavase Fragment 
Length Polymoiphism™ cleavage reaction as an example, when the partial cleavage products 
of an end labeled nucleic acid fragment are resolved by size (e.g., by electrophoresis), the result 

35 is a ladder of bands indicating the site of each cleavage, measured from the labeled end. Similar 
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analysis can be done for chemical modifications that block DNA synthesis; extension of a 
primer on molecules that have been partially modified will yield a nested set of termination 
products. Determining the sites of cleavage/modification may be done with some degree of 
accuracy by comparing the products to size markers (e.g., commercially available fragments of 
5 DNA for size comparison) but a more accurate measure is to create a DNA sequencing ladder 
for the same segment of nucleic acid to resolve alongside the test sample. This allows rapid 
identification of the precise site of cleavage or modification. 

Example 27: General Reaction Conditions for Synthesis of Some Compounds of the Invention 
10 Reactions were conducted under an atmosphere of nitrogen when anhydrous solvents 

were used. All reactions were monitored by thin-layer chromatography (TLC) using EM 
reagent plates with florescence indicator (SiC>2-60, F-254). The compounds were visualized 
under UV light and by spraying with a mixture of 5% aqueous sulfuric acid and ethanol 
followed by heating. Silica gel 60 (particle size 0.040-0.063 mm, Merck) was used for flash 
15 column chromatography. NMR spectra were recorded at 300 MHz for *H NMR, 75.5 MHz for 
13 C NMR and 121.5 MHz for 3l P NMR on a Varian Unity 300 spectrometer. 5-Values are in 
ppm relative to tetramethyl silane as internal standard ( ! H and ,3 C NMR) and relative to 85% 
H3PO4 as external standard ( 3I P NMR). Coupling constants are given in Hertz. The 
assignments, when given, are tentative, and the assignments of methylene protons, when given, 
20 may be interchanged. Bicyclic compounds are named according to the Von Bayer 
nomenclature. Fast atom bombardment mass spectra (FAB-MS) were recorded in positive ion 
mode on a Kratos MS50TC spectrometer. The composition of the oligonucleotides were 
verified by MALDI-MS on a Micromass Tof Spec E mass spectrometer using a matrix of 
diammonium citrate and 2,6-dihydroxyacetophenone. 

25 

Example 28: Synthesis of K2-0-Isopropylidene-5-0-methanesulfonvl-4-C- 
methanesulfonvloxvmethvl-3"Q-(p-methoxvbenzvl)-a-D-ribofuranose TCompound 2 in Scheme 
1 above! 

Mesyl chloride (8.6 g, 7.5 mmol) was dropwise added to a stirred solution of 4-C- 
30 hydroxymethyl-l,2-0-isopropylidene-3-0-/?-methoxybenzyl-a-D^ribofuranose [R. Yamaguchi, 
T. Imanishi, S. Kohgo, H. Horie and H. Ohrui, BioscL Biotechnoi Biochem.* 1999, 63, 736] (1, 
10.0 g, 29.4 mmol) in anhydrous pyridine (30 cm 3 ) and the reaction mixture was stirred 
overnight at room temperature. The mixture was evaporated to dryness under reduced pressure 
to give a residue which was co-evaporated with toluene (2 x 25 cm 3 ), dissolved in CH 2 C1 2 (200 
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cm 3 ) and washed successively with saturated aqueous NaHC0 3 (2 x 100 cm 3 ) and brine (50 
cm 3 ). The organic phase was dried (Na2S04), filtered and evaporated to dryness under reduced 
pressure. The colorless viscous oil obtained was purified by column chromatography [0.5-1% 
(v/v) MeOH in CH2CI2 as eluent], followed by crystallization from MeOH to give furanose 2 as 
5 a white solid material (13.6 g, 93%); R/0.51 (CH 2 CI 2 /MeOH 95:5, v/v); <fc (CDC1 3 ) 7.30 (2 H, 
d, J 8.7), 6.90 (2 H, d, J 8.5), 5.78 (1 H, d, J 3.7), 4.86 (1 H, d, J 12.0), 4.70 (1 H, d, / 11.4), 
4.62 (1 H, dd, J 5.0 and 3.8), 4.50 (1 H, d, J 11.1), 4.39 (1 H, d, J 12.3), 4.31 (1 H, d, J 11.0), 
4.17 (1 H, d,7 5.1), 4.11 (I H, d, J 11.0), 3.81 (3 H, s), 3.07 (3 H, s), 2.99 (3 H, s), 1.68 (3 H, 
s), 1.34 (3 H, s); £ (CDCI3) 159.8, 129.9, 128.8, 114.1, 114.0, 104.5, 83.2, 78.0, 77.9, 72.6, 
10 69.6, 68.8, 55.4, 38.1, 37.5, 26.3, 25.7. 

Example 29: Synthesis of Methyl 5<)-methanesu]fonvl-4-C-methanesulfonvloxvmethvl-3-Q- 
(p-methoxybenzyO-D-ribofuranoside TCompound 3 in Scheme 1 above] 

A suspension of furanoside 2 (13.5 g, 27.2 mmol) in a mixture of H2O (45 cm 3 ) and 

15 15% HC1 in MeOH (450 cm 3 , w/w) was stirred at room temperature for 72 h. The mixture was 
carefully neutralized by addition of saturated aqueous NaHCCh (100 cm 3 ) followed by NaHC0 3 
(s) whereupon the mixture was evaporated to dryness under reduced pressure. H 2 0 (100cm 3 ) 
was added, and extraction was performed with EtOAc (3 x 100 cm 3 ). The combined organic 
phase was washed with brine (100 cm 3 ), dried (Na 2 S0 4 ), filtered and then evaporated to dryness 

20 under reduced pressure. The residue was coevaporated with toluene (2 x 25 cm 3 ) and purified 
by column chromatography [1-2% (v/v) MeOH in CH2CI2] to give furanoside 3 as an anomeric 
mixture (clear oil, 11.0 g, 86%, ratio between anomers ca. 6:1); J?/0.39, 0.33 (CH 2 Cl 2 /MeOH 
95:5, v/v); Sh (CDCI3, major anomer only) 7.28 (2 H, d, J 8.4), 6.91 (2 H, d, / 8.9), 4.87 (1 H, 
s), 4.62(1 H, d,/ 11.4), 4.53 (1 H, d, J 1 1.2), 4.41 (2 H, s), 4.31 (1 H, d, J 9.8), 4.24 (1 H, d, J 

25 4.6), 4.06 (1 H, d, J 10.0), 3.98 (1 H, br s), 3.81 (3 H, s), 3.33 (3 H, s), 3.06 (3 H, s), 3.03 (3 
H,s); £ (CDCI3, major anomer only) 160.0, 130.1, 128.5, 114.3, 107.8, 81.7, 81.2, 73.8, 73.6, 
69.7, 69.6, 55.5, 55.4, 37.5, 37.4. 

Example 30: Synthesis of nR,3RS>4R.7S)-l-Methanesulfonvloxymethvl-3-methoxV"7-(p- 
30 methoxvbenzvloxv)-2.5-dioxabicvclof2.2.11heptane fCompound 4 in Scheme 1 above! 

To a stirred solution of the anomeric mixture of Compound 3 (10.9 g, 23.2 mmol) in 
anhydrous DMF (50 cm 3 ) at 0 °C was during 10 min added sodium hydride (2.28 g of a 60% 
suspension in mineral oil (w/w), 95.2 mmol) and the mixture was stirred for 12 h at room 
temperature. Ice-cold H 2 0 (200 cm 3 ) was slowly added and extraction was performed using 
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EtOAc (3 x 200 cm 3 ). The combined organic phase was washed successively with saturated 
aqueous NaHC0 3 (2 x 100 cm 3 ) and brine (50 cm 3 ), dried (Na 2 S0 4 ), filtered and evaporated to 
dryness under reduced pressure. The residue was purified by column chromatography {0.5-1% 
(v/v) MeOH in CH 2 C1 2 ] to give first the major isomer (6.42 g, 74%) and then [1.5% (v/v) 
5 MeOH in CH 2 C1 2 ] the minor isomer (1.13 g, 13%), both as clear oils; R f 0.56, 0.45 
(CH 2 Cl 2 /MeOH 95:5, v/v); & (CDC1 3 , major isomer) 7.16 (2 H, d, J 8.8), 6.74 (2 H, d, J 8.4), 
4.65 (1 H f s), 4.42-4.32 (4 H, m), 3.95-3.94 (2 H, m), 3.84 (1 H, d, J 7.4), 3.66 (3 H, s), 3.54 (1 
H, d, /7.4), 3.21 (3 H, s), 2.90 (3 H, s); <£ (CDCI3. major isomer) 159.6, 129.5,129.3, 114.0, 
105.3, 83.2, 78.6, 77.2, 72.1, 71.8, 66.3, 55.6, 55.4, 37.8; & (CDCI3 , minor isomer) 7.27 (2 H, 
10 d, J 8.9), 6.89 (2 H, d, J 8.6), 4.99 (1 H, s), 4.63-4.39 (4 H, m), 4.19 (1 H, s), 4.10-3.94 (2 H, 
m), 3.91 (1 H, s), 3.81 (3 H, s), 3.47 (3 H, s), 3.05 (3 H, s); £ (CDC1 3 , minor isomer) 159.7, 
129.6, 129.5, 114.1, 104.4, 86.4, 79.3, 77.1, 72.3, 71.9, 66.2, 56.4, 55.4, 37.7. 

Example 31: Synthesis of ( LR.4R,7S)-l-Acetoxvmethvl-3-methoxv-7-(p-methoxvbenzvloxvV 

15 2.5-dioxabicvclo[2.2.nheptane fCompound 5 in Scheme 11 

To a stirred solution of furanoside 4 (major isomer, 6.36 g, 17.0 mmol) in dioxane (25 
cm 3 ) was added 18-crown-6 (9.0 g, 34.1 mmol) and KOAc (8.4 g, 85.6 mmol). The stirred 
mixture was heated under refluxed for 12 h and subsequently evaporated to dryness under 
reduced pressure! The residue was dissolved in CH 2 C1 2 (100 cm 3 ) and washing was performed, 

20 successively, with saturated aqueous NaHCC>3 (2 x50 cm 3 ) and brine (50 cm 3 ). The separated 
organic phase was dried (Na 2 SC>4), filtered and evaporated to dryness under reduced pressure. 
The residue was purified by column chromatography [1% (v/v) MeOH in CH 2 C1 2 ] to give 
furanoside 5 as a white solid material (one anomer, 5.23 g, 91%); Rf 0.63 (CH 2 Cl2/MeOH 95:5, 
v/v); A (CDCI3) 7.27-7.24 (2 H, m), 6.90-6.87 (2 H, m), 4.79 (1 H f s), 4.61 (1 H, d, J 11.0), 

25 4.49 (2 H, m), 4.28 (1 H, d, J 11.0), 4.04 (3 H, m), 3.80 (3 H, s). 3.68 (1 H, m), 3.36 (3 H, s), 
2.06 (3 H, s); 4 (CDCI3) 170.7, 159.5, 129.5, 129.4, 113.9, 105.1, 83.3, 78.9, 77.2, 72.0, 71.9, 
61.0,55.4, 55.3,20.8. 

Example 32: Synthesis of (lS.4RJS)-l-Hydroxvmethvl-3-methoxv-7-(p-methoxybenzyloxv)- 
30 2.5-dioxabicvclor2.2.nheptane rCompound6 in Scheme 11 

A solution of furanoside 5 (one anomer, 5.16 g, 15.3 mmol) in saturated methanolic 
ammonia (200 cm 3 ) was stirred at room temperature for 48 h. The reaction mixture was 
evaporated to dryness under reduced pressure, coevaporated with toluene (2 x 50 cm 3 ), and the 
residue purified by column chromatography [2-3% (v/v) MeOH in CH 2 C1 2 ] to give furanoside 
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6 as a white solid material (one anomer, 3.98 g, 88%); rt/0.43 (CH 2 Cl 2 /MeOH 95:5, v/v); <fe 
(CDC1 3 ) 7.27 (2 H, d, J 8.6), 6.88 (2 H, d, J 8.9), 4.79 (1 H, s), 4.59 (1 H, d, J 1 1.3), 4.53 (1 H, 
d, J 1 1.4), 4.09 (2 H, s), 3.97 (1 H, d, J 7.5), 3.86 (2 H, br s), 3.80 (3 H, s), 3.75-3.62 (2 H, m), 
3.37 (3 H, s); & (CDCI3) 159.4, 129.7, 129.3, 113.9, 105.2, 85.6, 78.3, 77.4, 71.9, 71.8, 58.8, 
5 55.5, 55.3. 

Example 33: f lS.4RJSV3-Methoxv-7-(p"methoxvbenzvloxy)>l-(p-methoxvbenzvloxvmethvl)- 
2.5-dioxabicyclof2.2, llheptane f Compound 7 in Scheme 11 

To a stirred solution of furanoside 6 (one anomer, 3.94 g, 13.3 mmol) in anhydrous 

10 DMF (50 cm 3 ) at 0 °C was added a suspension of NaH [60% in mineral oil (w/w), 1.46 g, 60.8 
mmol] followed by dropwise addition of p-methoxybenzyl chloride (2.74 g, 17.5 mmol). The 
mixture was allowed to warm to room temperature and stirring was continued for another 4 h 
whereupon ice-cold H2O (50 cm 3 ) was dropwise added. The mixture was extracted with CH2CI2 
(3 x 100 cm 3 ) and the combined organic phase was washed with brine (100 cm 3 ), dried 

15 (Na2SC>4), filtered, evaporated to dryness under reduced pressure and coevaporated with toluene 
(3 x 50 cm 3 ). The residue (4.71 g) tentatively assigned as a mixture of 7 and aldehyde 11 was 
used in the preparation of 11 (see below) without further purification. 

Example 34; 4-C-Methanesulfonvloxymethyl-3.5-di-Q-fp-methoxybenzyl V 1 .2-Q- 

20 isopropylidene-a-D-ribofuranose TComPOund 9 in Scheme 1 1 

4-C-Hydroxymethyl-3,5-di-0-(p-methoxybenzyl)-l,2-0~isopropylidene-a-D- 
ribofuranose [R. Yamaguchi, T. Imanishi, S. Kohgo, H. Horie and H. Ohrui, Biosci. Biotechnol. 
Biochem. y 1999, 63, 736] (8, 3.2 g, 6.95 mmol) was mesylated using MsCl (2.00 g, 17.5 mmol) 
and pyridine (10 cm 3 ) following the procedure described for 2. After work-up, the colorless 

25 viscous oil was purified by column chromatography [1% (v/v) MeOH in CH2CI2] to give 
derivative 9 in 89% yield (3.17 g) as a clear oil; tf/0.45 (CH 2 Cl 2 /MeOH 98:2, v/v); & (CDC1?) 
7.22 (2 H, d, J 8.9), 7.18 (2 H, d, J 8.7), 6.86 (4 H, d, J 8.3), 5.76 (1 H, d, 7 3.8), 4.83 (1 H, d, J 
12.0), 4.64 (1 H, d, J 1 1.6), 4.59 (1 H, m), 4.49-4.35 (4 H, m), 4.24 (1 H, d, J 5.3), 3.80 (6 H, s), 
3.56 (1 H, d, J 10.5), 3.45 (1 H, d, J 10.5), 3.06 (3 H, s), 1.67 (3 H, s), 1.33 (3 H, s); 4 (CDCI3) 

30 159.6, 159.4, 129.9, 129.8, 129.7, 129.5, 129.4, 129.3, 114.0, 113.9, 113.8, 113.7, 113.6, 104.5, 
84.9, 78.6, 78.1, 73.4, 72.4, 71.0, 69.9, 55.3, 38.0, 26.4, 25.9. 

Example 35: Methyl 4-C-methanesulfonvloxvmethvl-3.5-di-0-(p-methoxvbenzvnD- 
ribofuranose fCompound 10 in Scheme 11 
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Methanolysis of furanoside 9 (3.1 g, 5.76 mniol) was performed using a mixture of a 
solution of 15% HC1 in MeOH (w/w, 120 cm 3 ) and H2O (12 cm 3 ) following the procedure 
described for the synthesis of 3. After work-up, the crude product was purified by column 
chromatography [0.5-1% (v/v) MeOH in CH 2 C1 2 ] to give the major anomer of 10 (1.71 g, 58%) 
5 and [1-1.5% (v/v) MeOH in CH2CI2] the minor anomer of 10 (0.47 g, 16%), both as clear oils; 
/?,0.31, 0.24 (CH 2 Cl 2 /MeOH 98:2, v/v); & (major anomer, CDCI3) 159.8. 159.5, 129.9, 129.8, 
129.6, 129.5, 129.0, 114.2, 114.1, 114.0, 113.9, 107.9, 84.7, 79.9, 74.2, 73.5, 73.5, 70.2, 64.4, 
55.6, 55.4, 37.4. 

10 Example 36: Alternative preparation of Compound 7 in Scheme 1 

Ring closure of furanoside 10 (major anomer, 1.68 g, 3.28 mmol) was achieved using 
NaH (60% suspension in mineral oil (w/w), 0.32 g, 13.1 mmol) in anhydrous DMF (10 cm 3 ) 
following the procedure described for the synthesis of 4 to give a crude product tentatively 
assigned as a mixture of furanoside 7 and aldehyde 11 (see below) (1.13 g). 

15 

Example 37: (2R,3S,4SM-Hydroxy-3-fp-methoxybenzyloxv)-4-(p- 
methoxybenzyloxvmethvn-tetrahvdrofuran-2-carbaldehvde TCompound 11 in Scheme 11 

A solution of crude furanoside 7 (as a mixture with 11 as prepared as described above, 
5.80 g) in 80% glacial acetic acid (100 cm 3 ) was stirred at 50 °C for 4 h. The solvent was 

20 distilled off under reduced pressure and the residue was successively coevaporated with 
absolute ethanol (3 x 25 cm 3 ) and toluene (2 x 25 cm 3 ) and purified by column chromatography 
[4-5% (v/v) MeOH in CH 2 C1 2 ] to give aldehyde 11 as a colorless oil (4.60 g); R f 0.37 
(CH 2 Cl 2 /MeOH 95:5, v/v); Sh (CDC1 3 ) 9.64 (1 H, br s), 7.27-7.17 (4 H, m), 6.87-6.84 (4 H, m), 
4.59 (1 H, d, J 11.6), 4.51-4.41 (2 H, m), 4.35 (1 H, s), 3.92-3.90 (2 H, m), 3.79 (6 H, s), 3.77- 

25 3.68 (3 H, m), 3.55 (2 H, br s); <£ (CDC1 3 ) 203.6, 159.5, 159.4, 129.7, 129.6, 129.5, 129.2, 
114.0, 113.9, 113.8, 87.3, 86.7,81.0, 75.1,73.4,71.6, 67.6, 55.3. 

Example 38: General procedure for the reaction of arvl magnesium bromides with aldehyde 11 
to give Compounds 12a-e in Scheme 2 
30 A solution of aldehyde 11 (Scheme 2) in anhydrous THF (10 cm 3 ) was added dropwise 

during 5 min to a stirred solution of the aryl magnesium bromide dissolved in anhydrous THF 
at 0 °C. The mixture was allowed to heat to room temperature and stirred for 12 h. The mixture 
was evaporated to dryness under reduced pressure and the residue diluted with CH2CI2 and 
washed several times with saturated aqueous NH4CI. The organic phase was dried (NaaSO^j), 
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filtered, and evaporated to dryness under reduced pressure. Column chromatography of the 
crude product obtained afforded the compounds 12a-e as shown in Scheme 2. 

Example 38a: Synthesis of f2S.3S.4S)-4-Hvdroxy-2-r(RVhydroxy(phenynmethvn-4-(p- 
5 methoxvbenzyloxvV3-(p-methoxvbenzvloxvmethvn tetrahvdrofuran rCompound 12a of 
Scheme 21 

Grignard reaction of phenylmagnesium bromide (1.0 M solution in THF, 14.2 cm 3 , 14.2 
mmol) with aldehyde 11 (515 mg, 1.28 mmol) afforded 12a as shown in Scheme 2. The crude 
product was purified by column chromatography [4% (v/v) MeOH in CH2CI2] to give 

10 tetrahydrofuran 12a (540 mg, 88%) as a colorless oil; R f 0.34 (CH 2 Cl 2 /MeOH 95:5, v/v); & 
(CDCI3) 7.40-7. 19 (7 H, m), 6.91-6.73 (6 H, m), 4.73 (1 H, d, J 6.4), 4.48 (2 H, s), 4.08 (2 H, s), 
3.88 (1 H, d, J 9.4), 3.79 (1 H, m), 3.78 (3 H, s), 3.76 (3 H, s), 3.75-3.69 (2 H, m), 3.50 (1 H, d, 
J 9.4), 3.45 (1 H, s), 3.42 (1 H, br s), 3.26 (1 H, br s); <5fe (CDC1 3 ) 159.5, 159.3, 140.7, 129.7, 
129.6, 129.5, 129.2, 128.5, 128.0, 127.3, 113.9, 113.8, 113.7, 89.4, 84.6, 81.8, 75.3, 74.7, 73.5, 

15 71.6, 69.3, 55.3; m/z (FAB) 503 [M+Na]\ 479 [M-H] + , 461 [M-H-H 2 0] + . 

Example 38b: Synthesis of (2S,3S.4S)-4-Hydroxv-2-f(R)-hvdroxvf4-fluoro-3-methvlDhenvl>- 
methvll^-fp-methoxvbenzvloxv^-S-fD-methoxvbenzvloxvmethvl^tetrahydrofuran fCompound 
12b of Scheme 21 

20 Grignard reaction of 4-fluoro-3-methylphenyImagnesium bromide (1.0 M solution in 

THF, 15,0 cm 3 , 15.0 mmol) with aldehyde 11 (603 mg, 1.5 mmol) afforded 12b as shown in 
Scheme 2. The crude product was purified by column chromatography [4-5% (v/v) MeOH in 
CH 2 C1 2 ] to give tetrahydrofuran 12b (61 1 mg, 85%) as a colorless oil; if/ 0.34 (CH 2 Cl2/MeOH 
95:5, v/v); Sh (CDC1 3 ) 7.24-7.12 (5 H, m), 6.98-6.84 (5 H, m), 6.77 (1 H, d, J 8.5), 4.65 (1 H, 

25 dd, /2.8 and 6.4), 4.49 (2 H, s), 4.15 (2 H, s), 4.0i (1 H, dd, 72.3 and 6.5), 3.87 (1 H, d, J 9.3), 
3.79 (3H, s), 3.78 (3 H, s), 3.76-3.68 (2 H, m), 3.52 (1 H, s), 3.47 (1 H, d, J 10.3), 3.42 (1 H, d, 
J 2.9), 3.22 (1 H, s), 2.24 (3 H, d, J 0.8); £ (CDC1 3 ) 162.7, 159.5, 159.4, 136.2, 136.1, 130.3, 
130.2, 129.7, 129.6, 129.5, 129.4, 129.1, 126.1, 126.0, 115.1, 114.8, 114.0, 113.9, 113.8, 89.3, 
84.5, 81.8, 75.3, 74.0, 73.5, 71.7, 69.2, 55.4, 55.3, 14.7 (d, / 3.9); m/z (FAB) 535 [M+Na] + , 511 

30 [M-H] + , 493 [M-H-H 2 Of . 

Example 38c: Synthesis of (2S,3S.4SM-Hvdroxv-2-f(R)-hvdroxv(l-naphtvnmethvl1-4-(P' 
methoxvbenzvlox v)-3-(p-methoxvbenzy]ox vmethvP tetrahvdrofuran fCompound 1 2c of 
Scheme 21 
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1-Bromonaphthalene (1.55 g, 7.5 mmol) was added to a stirred mixture of magnesium 
turnings (182 mg, 7.5 mmol) and iodine (10 mg) in THF (10 cm 3 ). The mixture was stirred at 
40 °C for 1 h whereupon it was allowed to cool to room temperature. A solution of aldehyde 11 
(603 mg, 1.5 mmol) in THF (10 cm 3 ) was added slowly and the reaction was stirred for 12 h. 
5 The crude product was purified by column chromatography [4-5% (v/v) MeOH in CH2CI2] to 
give tetrahydrofuran 12c (756 mg, 95%) as a colorless oil; R f 0.35 (CH 2 Cl 2 /MeOH 95:5, v/v); 
& (CDCI3) 8.08 (I H, m), 7.86 (1 H, m), 7.79 (1 H, d, / 8.2), 7.72 (1 H, d, J 7.2), 7.49-7.44 
(3H. m). 7.18 (2 H, d, J 8.4), 6.84 (2 H, d, J 8.6), 6.74 (2 H, d, J 8.7), 6.68 (2 H, d, J 8.8), 5.52 
(1 H, dd, J 3.7 and 5.6), 4.45 (2 H, s), 4.34 (1 H, dd, J 2.5 and 5.9), 4.03 (1 H, d, J 1 1.0), 3.96 (1 
10 H, d, J 1 1.0), 3.93 (1 H, d, J 9.5), 3.80 (1 H, d, J 9.3), 3.77 (3 H, s), 3.75 (1 H, d, J 2.6), 3.72 (3 

H, s), 3.68 (1 H, d, J 9.3), 3.56 (1 H, d, J 3.7), 3.49 (1 H, d, J 9.3), 3.34 (1 H, s); & (CDCI3) 
159.5, 159.3, 136.3, 134.0, 131.0, 129.7, 129.6, 129.5, 129.4, 129.0, 128.6, 128.2, 125.6, 125.5, 
123.5, 114.0, 113.8, 113.7, 88.7, 84.7, 81.9, 75.5, 73.5, 71.7, 71.3, 69.3, 55.4, 55.3; m/z (FAB) 
553 [M+Na] + , 529 [M-Hf, 511 [M-H-H 2 0] + . 

15 

Example 38d: (2S.3S.4S i-4-H vdroxv-2-rf RVhvdrox vf 1 -pvrenvnmethvn-4-fp-methox v- 
benzyloxyy3-(p-methoxvbenzyloxymethyDtetrahydrofuran fCompound 12d of Scheme 21 

Tetrahydrofuran 12d was synthesized from aldehyde 11 (515 mg. 1.28 mmol), 1- 
bromopyrene ( 1 .0 g, 3.56 mmol), magnesium turnings (155 mg, 6.4 mmol), iodine (10 mg ) and 

20 THF (20 cm ) following the procedure described for synthesis of compound 12c. The crude 
product was purified by column chromatography [3-4% (v/v) MeOH in CH2CI2] to give 
tetrahydrofuran 12d (690 mg, 89%) as a pale yellow solid; R f 0.35 (CH 2 Cl 2 /MeOH 95:5, v/v); 
S» (CDCb) 8.23 (2 H, d, J 8.4 and 9.2), 8.19-8.13 (3 H, m), 8.05-7.99 (4 H, m), 7.14 (2 H, d, J 
8.8), 6.82 (2 H, d, J 9.0), 6.30 (2 H, d, J 8.7), 6.20 (2 H, d, J 8.6), 5.87 (1 H. d, J 7.2), 4.43 (2 H, 

25 s), 4.41 (I H, m),4.01 (1 H,d,/9.4), 3.91 (1 H, d, J 11.8), 3.86(1 H, d, / 9.2), 3.77 (1 H, d, J 

I. 9), 3.76 (3 H, s). 3.70-3.64 (3 H, m), 3.52-3.45 (1 H, m), 3.44 (3 H, s); <5b (CDCI3) 159.5, 
158.9, 133.9, 131.4, 131.1, 130.7, 129.7, 129.5, 129.2, 128.9, 128.5, 127.8, 127.7, 127.5, 126.0, 
125.5, 125.3, 125.2, 125.1, 125.0, 124.9, 122.9, 113.9, 113.3, 89.5, 83.5, 82.0, 75.7, 73.4, 71.3, 
71.0, 69.3. 55.3. 55.0; m/z (MALDI) 627 [M+Na] + . 609 [M++Na-H 2 0] + . 

30 

Example 38e: f2S.3S.4S)-4-Hvdroxv-2-r(R)-hvdroxv(2.4.5-trimethvlphenvnmethvll-4-(p - 
methoxvbenzvloxv)-3-(p-methoxvbenzvloxymethvl) tetrahydrofuran TCompound 12e of 
Scheme 21 


LNA2I/SKA/MSL 


5/14/2003 


171 

Tetrahydrofuran 12e was synthesized from aldehyde 11 (515 mg, 1.28 mmol), 1-bromo- 
2,4,5-trimethylbenzene (1.28 g, 6.4 mmol), magnesium turnings (155 mg, 6.4 mmol), iodine 
(10 mg) and THF (20 cm 3 ) following the procedure described for synthesis of compound 12c. 
The crude product was purified by column chromatography [3-4% (v/v) MeOH in CFkCh] to 
5 give tetrahydrofuran 12e (589 mg, 88%) as a colorless oil; R f 034 (CH 2 Cl 2 /MeOH 95:5, v/v); 
Sti (CDC1 3 ) 7.25 (2 H, d, J 8.7), 7.21 (2 H, d, J 8.9), 6.90 (1 H, s), 6.87 (1 H, s), 6.85 (2 H, d, J 
8.9), 6.76 (2 H, d, J 8.7), 4.95 (1 H, dd, / 3.6 and 5.9), 4.48 (2 H, s), 4.18-4.08 (3 H, m), 3.89 (1 

H, d, J 9.6), 3.80(1 H, m), 3.79(3 H, s), 3.77 (3 H,s),3.71 (1 H, d, J 9.2), 3.64 (1 H,d,7 2.6), 
3.51 (1 H, d. J 9.4), 3.24 (1 H, s), 3.18 (1 H, d, J 3.4), 2.25 (3 H,s), 2.22 (3 H,s), 2.21 (3 H, s); 

10 $ (CDCI3) 159.5, 159.3, 136.0. 135.8, 134.2, 132.5, 132.0, 129.8, 129.7, 129.6, 129.5, 
128.5,113.9, 113.8, 88.6, 84.7, 81.7, 75.4, 73.5. 71.7. 70.9, 69.4, 55.3, 19.5, 19.4, 19.0; m/z 
(FAB) 545 [M+Na] + , 521 [M-Hf, 503 [M-H-H 2 0] + . 

Example 39: General procedure for the cvclization of 12a-e to give compounds 13a-e as shown 
15 in Scheme 2. 

Ar,A^ f AT-Tetramethylazodicarboxamide (TMAD) was added in one portion to a stirred 
solution of the compounds 12a-e as shown in Scheme 2 and tributylphosphine in benzene at 0 
°C. The mixture was stirred for 12 h at room temperature whereupon it was diluted with diethyl 
ether (50 cm 3 ). The organic phase was washed successively with saturated aqueous NH4CI (2 x 
20 20 cm 3 ) and brine (25 cm 3 ), dried (Na2SC>4), filtered and evaporated to dryness under reduced 
pressure. The crude product obtained was purified by column chromatography [1.5-2% (v/v) 
MeOH in CH2CI2] to give compounds 13a-e as shown in Scheme 2. 

Example 39a: (lS^S^R^S^-fp-Methoxybenzvloxv^-l-fp-methoxybenzvloxymethyD-S- 
25 phenvU2.5-dioxabicvclof2.2.nheptane rCompound 13a of Scheme 21 

Cyclization of compound 12a (540 mg, 1.13 mmol) in the presence of TMAD (310 mg, 

I. 8 mmol), PBU3 (364 mg, 1.8 mmol) and benzene (10 cm 3 ) followed by the general work-up 
procedure and column chromatography afforded compound 13a as a colorless oil (400 mg, 
77%); /?,0.51 (CH 2 Cl 2 /MeOH 98:2, v/v); Sh (CDCI3) 7.36-7.33 (7 H, m), 7.10 (2 H, d, J 8.3), 

30 6.88 (2 H, d, J 8.7), 6.78(2 H, d, J 8.7), 5.17 (1 H, s, H-3), 4.59 (2 H, br s, -CH 2 (MPM)), 4.43 (1 
H, d, J 1 1.3, -CH 2 (MPM)), 4.34 (1 H, d, / 1 1.3, -CH 2 (MPM)), 4.19 (1 H, s, H-4), 4.09 (1 H, d, 
J 7.7, H-6), 4.06 (1 H, d, J 7.7, H-6). 4.01 (1 H, s. H-7), 3.82-3.77 (5 H, m, -C1-CH2-O-, 
OCH3), 3.76 (3 H, s, -OCH3); 4 (CDCI3) 159.4, 159.3, 139.4 (C 1'), 130.3, 129.7, 129.5, 129.3, 
128.5, 127.5, 125.4, 113.9, 113.8, 85.9 (C-l), 84.1 (C-3), 81.1 (C-4), 77.4 (C-7), 73.7 (- 
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CH 2 (MPM)), 73.4 (C-6), 71.8 (-CH 2 (MPM)), 66.3 (-C1-CH2-O-). 55.4 (-OCH 3 ). 55.3 (-OCH3); 
m/z (FAB) 467 [M+Na-H 2 0] + , 461 [M-H] + . 

Example 39b: f 1 S.3S.4R.7S)-3-(4-Fluoro-3-methylphen vl)-7-f p-methoxybenzvloxv)- 1 -(p- 
5 methoxybenzyloxymethylV2.5-dioxabicyclof2.2. llheptane [Compound 13b of Scheme 21 

Cyclization of compound 12b (550 mg, 1.08 mmol) in the presence of TMAD (275 mg, 
1.6 mmol), PB113 (325 mg, 1.6 mmol) and benzene (10 cm 3 ) followed by the general work-up 
procedure and column chromatography afforded compound 13b as a colorless oil (445 mg, 
84%); R/0.52 (CH 2 Cl 2 /MeOH 98:2, v/v); & (CDC1 3 ) 7.28 (2 H, d, J 8.7 ), 7.1 1 (2 H, d, / 8.6), 

10 7.08-7.09 (2 H, m, H-2' and H-6'), 6.94 (1 H, dd, J 8.5 and 9.2, H-5'), 6.88 (2 H, d, J 8.6). 6.79 
(2 H, d, J 8.4), 5.08(1 H, s, H-3), 4.62-4.55 (2 H, m, -CH 2 (MPM)). 4.45 (1 H, d, J 11.1, - 
CH 2 (MPM)), 4.36 (1 H, d, J 1 1.6, -CH 2 (MPM)), 4.13 (1 H, s, H-4), 4.07, 4.03 (1 H each, 2d, J 
7.6 each, H-6), 3.99 (1 H, s, H-7), 3.81 (2 H, m,-Ci-CH 2 -0-), 3.80 (3 H, s,-OCH 3 ), 3.77 (3 H, 
s,-OCH 3 ), 2.23 (3 H, d, / 1.6, Ar-CH 3 ); & (CDCI3) 162.3 (C-4'), 159.4, 159.3, 134.8. 134.7, 

15 130.3, 129.6. 129.5, 129.2, 128.5, 128.4, 128.3, 124.2, 115.1, 114.8, 113.9, 113.8, 85.9 (C-l), 
83.5 (C-3), 81.0 (C-4), 77.1 (C-7), 73.6 (-CH 2 (MPM)), 73.4 (C-6), 71.8 (-CH 2 (MPM)), 66.2 (- 
C,-CH 2 -0-), 55.4 (-OCH3), 55.3 (-OCH3), 14.7 ( d, J 3.3, Ar-CH 3 ); m/z (FAB) 494 [M] + , 493 
[M-H] + . 

20 Example 39c: (lS.3S.4R.7SV7-(p-MethoxybenzyloxvVl-(p-methoxvbenzvloxvmethyn-3-(l- 
naphthyl)-2.5-dioxabicyclor2.2.11heptane fCompound 13c of Scheme 21 

Cyclization of compound 12c (700 mg, 1.32 mmol) in the presence of TMAD (345 mg, 
2.0 mmol), PBu 3 (405 mg, 2.0 mmol) and benzene (15 cm 3 ) followed by the general work-up 
procedure and column chromatography afforded compound 13c as a colorless oil (526 mg, 

25 78%); R f 0.53 (CH 2 Cl 2 /MeOH 98:2, v/v); 6» (CDCI3) 7.91-7.86 (2 H, m), 7.78 (1 H, d, J 8.2), 
7.73 (1 H, d, J 7.1). 7.53-7.46 (3 H, m), 7.32 (2 H, d, J 8.7), 7.04 (2 H, d, / 8.7), 6.90 (2 H, d, J 
8.3), 6.71 (2 H, d, J 8.6), 5.79 (1 H, s, H-3), 4.67-4.61 (2 H, m, -CH 2 (MPM)), 4.43 (1 H, s, H- 
4), 4.38 (1 H, d, J 1 1.2, -CH 2 (MPM)), 4.27 (1 H, d, J 10.9, -CH 2 (MPM)), 4.16 (2 H, br s, H-6), 
4.08 (1 H, s, H-7), 3.91, 3.87 (1 H each, 2d, J 1 1.0 each,-C|-CH 2 -0-), 3.81 (3 H, s,-OCH 3 ), 3.72 

30(3 H, S.-OCH3); St (CDCI3) 159.3, 134.6 (C-l*), 133.5, 130.3, 129.8, 129.7, 129.4, 129.3, 128.9, 
128.1. 126.4, 125.8, 125.6, 123.8, 122.7, 113.9, 113.7, 85.7 (C-l), 82.3 (C-3), 79.9 (C-4), 78.2 
(C-7), 73.7 (-OCH 2 (MPM)), 73.5 (C-6), 71.8 (-OCH 2 (MPM)), 66.3 (-C,-CH 2 -0), 55.4 (- 
OCH3), 55.3 (-OCH3); m/z (FAB) 512 [M]\ 5 1 1 [M-Hf . 
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Example 39d: nS.3S.4R.7S>-7-fp-Methoxybenzyloxy)-l-fp-methoxvbenzyloxymethvn-3-f 1- 
pyrepyl1-2.5-dioxabicyclof2.2.nheptane fCompound 1 3d of Scheme 21 

Cyclization of compound 12d (650 mg, 1.08 mmol) in the presence of TMAD (275 mg, 
1.6 mmol), PBu 3 (325 mg, 1.6 mmol) and benzene (10 cm 3 ) followed by the general work-up 
5 procedure and column chromatography afforded compound 13d as a pale yellow solid (496 mg, 
79%); R f 0.53 (CH 2 Cl 2 /MeOH 98:2, v/v); Sh (CDC1 3 ) 8.29 (1 H, d, J 8.2), 8.18-8.12 (5 H, m), 
8.08-8.01 (2 H, m), 7.96 (1 H, d, / 7.5), 7.35 (2 H, d, J 8.5), 6.97 (2 H, d, J 8.9), 6.92 (2 H, d, J 
8.8), 6.60 (2 H. d, J 8.8), 6.09 (1 H, s, H-3), 4.71-4.65 (2 H, m, -CH 2 (MPM)), 4.49 (1 H, s, H- 
4), 4.34 (1 H, d, J 1 1.4, -CH 2 (MPM)), 4.23 (1 H, d, J 1 1.1, -CH 2 (MPM)), 4.25 (1 H, d, 77.6, H- 

10 6), 4.21 (1 H, d, J 7.8, H-6), 4.16 (1 H, s, H-7), 3.95-3.94 (2 H, m, -C,-CH 2 -0-), 3.81 (3 H, s,- 
OCH 3 ), 3.59 (3 H, s,-OCH 3 ); $ (CDCI3) 159.4, 159.3, 132.2 (C-l*), 131.4, 130.8, 130.7, 130.4, 
129.5, 129.4, 128.0, 127.5, 127.4, 126.9, 126.1, 125.6, 125.4, 124.9. 124.8, 124.7, 123.6, 122.0, 
113.9, 113.7, 85.9 (C-l), 82.7 (C-3), 80.6 (C-4), 77.9 (C-7), 73.9 (-OCH 2 (MPM)), 73.5 (C-6), 
71.8 (-OCH 2 (MPM)), 66.3 (-C,-CH 2 -0-), 55.4 (-OCH3), 55.2 (-OCH3); m/z (FAB) 587 

15 [M+H] + , 586 [M] + . 

Example 39e: ( 1 S.3S.4R..7S)-7-(p-Methoxybenzyloxy)- 1-f p-methoxybenzyloxymethvl)-3- 
(2.4.5-trimethvlphenvD-2.5-dioxa bicvclor2.2. II heptane rCompound 13e of Scheme 21 

Cyclization of compound 12e (550 mg, 1 .05 mmol) in the presence of TMAD (275 mg, 

20 1.6 mmol), PBU3 (325 mg, 1.6 mmol) and benzene (10 cm 3 ) followed by the general work-up 
procedure and column chromatography afforded compound 13e as a colorless oil (425 mg, 
80%); R f 0.52 (CH 2 Cl 2 /MeOH 98:2, v/v); S» (CDC1 3 ) 7.30 (2 H, d, J 9.0), 7.24 (1 H, s, H-6'), 
7.13 (2 H, d, J 8.9), 6.89 (1 H, s, H-3'), 6.88 (2 H, d, 78.8), 6.79 (2 H, d, J 8.6), 5.18 (1 H, s, H- 
3), 4.64-4.57 (2 H, m, -CH 2 (MPM)), 4.46 (1 H, d, J 11.2, -CH 2 (MPM)), 4.36 (1 H, d, J 11.5, - 

25 CH 2 (MPM)), 4.18 (1 H, s, H-4), 4.14 (1 H, s, H-7), 4.09 (1 H, d, J 7.9, H-6), 4.04 (1 H, d, J 7.7, 
H-6). 3.86 (2 H, s,-C,-CH 2 -0-). 3.80 (3 H, s,-OCH 3 ), 3.76 (3H, s,-OCH 3 ), 2.21 (6 H, s, 2 x Ar- 
CH 3 ), 2.17 (3 H, s, Ar-CH 3 ); <5fc (CDC1 3 ) 159.4, 159.3, 135.5 (C-l'), 134.4, 134.0, 131.7, 131.3, 
130.5, 129.9, 129.4, 129.2, 127.2, 113.9, 113.8, 85.6 (C-l), 82.4 (C-3), 79.4 (C-4), 77.6 (C-7), 
73.5 (-OCH 2 (MPM)), 73.4 (C-6), 71.8 (-OCH 2 (MPM)), 66.3 (-d-CH 2 -0), 55.4 (-OCH3), 55.3 

30 (-OCH3), 19.5 (-CH 3 ), 19.3 (-CH 3 ), 18.4 (-CH 3 ); m/z (FAB) 504 [M] + , 503 [M-H] + . 

Example 40: General procedure for the oxidative removal of the p-methoxvbenzyl groups to 
give Compounds 14a-e as shown in Scheme 2. 
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To a stirred solution of Compound 13a-e in CH2CI2 (containing a small amount of H2O) 
at room temperature, was added 2,3-dichloro-5,6-dicyanoquinone (DDQ) which resulted in an 
immediate appearance of a deep greenish-black color which slowly faded into pale brownish- 
yellow. The reaction mixture was vigorously stirred at room temperature for 4 h. The 
5 precipitate was removed by filtration through a short pad of silica gel and washed with EtOAc. 
The combined filtrate was washed, successively, with saturated aqueous NaHC03 (2 x 25 cm 3 ) 
and brine (25 cm 3 ). The separated organic phase was dried (Na2S0 4 ), filtered and evaporated to 
dryness under reduced pressure. The crude product obtained was purified by column 
chromatography [4-5% (v/v) MeOH in CH2CI2] to give compounds 14a-e. 

10 

Example 40a: (lS3S.4R > 7SV7-Hvdroxv-l-hvdroxvmethvl-3-phenvl-2.5-dioxabicvclor2.2.n- 
heptane TCompound 14a of Scheme 21 

Compound 13a (400 mg, 0.86 mmol) was treated with DDQ (600 mg, 2.63 mmol) in a 
mixture of CH2CI2 (10 cm 3 ) and H 2 0 (0.5 cm 3 ). After the general work-up procedure and 

15 column chromatography, compound 14a was obtained as a white solid material (128 mg, 66%); 
R f 0.30 (CH 2 Cl 2 /MeOH 9:1, v/v); Sh ((CD 3 ) 2 CO/CD 3 OD; (CD 3 ) 2 CO was added to the 
compound followed by addition of CD3OD until a clear solution appeared) 7.40-7.22 (5 H, m), 
4.99 (1 H, s), 4.09 (1 H, s), 4.04 (1 H, s), 4.01 (1 H, d. J 7.7), 3.86 (1 H, d, / 7.7), 3.90 (2 H, br 
s), 3.77 (2 H, br s); & ((CD 3 ) 2 CO/CD 3 OD; (CD 3 ) 2 CO was added to the compound followed by 

20 addition of CD 3 OD until a clear solution appeared) 140.0, 128.2, 127.2, 125.4, 87.2, 83.7, 83.5, 
72.3, 70.2, 58.4; m/z (FAB) 223 [M+H] + . 

Example 40b: ( 1 S3S.4R.7S)-3-f4-Fluoro-3-methylphenvn-7-hvdroxv- 1 -hvdrox vmethvl-2.5- 

dioxabicvclor2.2.11heptane TCompound 14b of Scheme 21 
25 Compound 13b (400 mg, 0.81 mmol) was treated with DDQ (570 mg, 2.5 mmol) in a 

mixture of CH2CI2 (10 cm 3 ) and H 2 0 (0.5 cm 3 ). After the general work-up procedure and 

column chromatography, compound 14b was obtained as a white solid material (137 mg, 67%); 

R/03 1 (CH 2 Cl 2 /MeOH 9: 1, v/v); & (CD 3 OD) 7.23 (1 H, d, J 8.1), 7.19 (1 H, m), 6.99 (1 H, dd, 

/ 8.5 and 9.3), 4.99 (1 H, s), 4.09 ( 1 H, s), 4.06 (1 H, s), 4.03 (1 H, d, / 7.6), 3.93-3.91 (3 H, m), 
30 2.25 (3 H, d, J 1.4); <5fc (CD 3 OD) 161.9 (d, J 243.3), 136.4 (d, / 3.4), 129.6 (d,7 5.0), 126.1 (d, / 

22.8), 125.5 (d, J 8.0), 1 15.7 (d, J 22.9), 88.5, 85.0, 84.3, 73.5, 71.3, 59.4, 14.5 (d, J 3.7); m/z 

(FAB) 255 [M+H] + . 
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Example 40c: ( 1 S.3S.4R.7S V7-Hvdrox v- 1 -hvdroxvmethvl-3-f 1 -naphthvl)-2.5-dioxabicvclo- 
f 2.2. 11 heptane rCompound 14b of Scheme 21 

Compound 13c (475 mg, 0.93 mmol) was treated with DDQ (600 mg, 2.63 mmol) in a 
mixture of CH2CL2 (10 cm 3 ) and H 2 0 (0.5 cm 3 ). After the general work-up procedure and 
5 column chromatography, compound 14c was obtained as a white solid material (170 mg, 67%); 
R f 0.31 (CH 2 Cl 2 /MeOH 9:1, v/v); Sh (CDCI3/CD3OD; CD3OD was added to the compound 
followed by addition of CDCI3 until a clear solution appeared) 7.94-7.86 (2 H, m), 7.80-7.74 (2 
H, m), 7.55-7.46 (3 H, m), 5.74 (1 H, s), 4.56 (2 H, br s), 4.37 (1 H, s), 4.24 (1 H. s), 4. 17^1. 1 1 
(2 H, m), 4.04 (2 H, br s); & (CDCI3/CD3OD; CD3OD was added to the compound followed by 
10 addition of CDCI3 until a clear solution appeared 134.7, 134.0, 130.2, 129.3, 128.6, 126.8, 
126.2, 125.8, 123.8, 122.8, 87.4, 83.1, 82.2, 73.1, 71.5, 59.0; m/z (FAB) 273 [M+H]*, 272 [M) + . 

Example 40d: (lS.3S.4R,7SV7-Hvdroxv«l-hydroxvmethvl-3-f l-pvrenyn-2.5-dioxabicvclo- 
r2.2.11heptane TCompound 14d of Scheme 21 

15 Compound 13d (411 mg, 0.7 mmol) was treated with DDQ (570 mg, 2.5 mmol) in a 

mixture of CH2CI2 (10 cm 3 ) and H2O (0.5 cm 3 ). After the general work-up procedure and 
column chromatography, compound 14d was obtained as a white solid material (182 mg, 75%); 
R f 0.32 (CH 2 Cl 2 /MeOH 9:1, v/v); & (CDCI3/CD3OD; CD3OD was added to the compound 
followed by addition of CDCI3 until a clear solution appeared) 8.32 (1 H, d, J 7.8), 8.23-8.18 (5 

20 H, m), 8.06 (2 H, br s), 8.01 (1 H, d, J 7.6), 6.06 (1H, s), 4.47 (1 H, s), 4.36 (1 H, s), 4.27-4.18 
(2 H, m), 4.10 (2 H, br s); 4 (CDCI3/CD3OD) 132.2, 131.0, 128.5, 127.8, 127.3, 126.5, 125.9, 
125.7, 125.1, 123.6, 122.1, 87.7, 83.7, 82.6, 73.1, 71.4, 58.9; m/z (FAB) 347 [M+H] + , 346 [M] + . 

Example 40e: ( 1 S,3S.4R/7S)-7-H vdrox v- 1 -hvdroxvmethyl-3-(2 A5-trimethvlphenyn-2.5- 

25 dioxabicvclof2.2.nheptane TCompound 14e of Scheme 21 

Compound 13e (355 mg, 0.7 mmol) was treated with DDQ (570 mg, 2.5 mmol) in a 
mixture of CH2CI2 (10 cm 3 ) and H 2 0 (0.5 cm 3 ). After the general usual work-up procedure and 
column chromatography, compound 14e was obtained as a white solid material (120 mg, 65%); 
R/031 (CH 2 Cl 2 /MeOH 9:1, v/v); & (CDCI3/CD3OD; CD 3 OD was added to the compound 

30 followed by addition of CDCI3 until a clear solution appeared) 7.23 (1 H, s), 6.92 (1 H, s), 5.14 
(1 H, s), 4.26 (1 H, s). 4.10 (1 H, s), 4.08, (1 H, d, J 7.7), 4.00-3.95 (3 H, m), 2.23 (6 H, s), 2.21 
(1 H, s); & (CDCI3/CD3OD; CD3OD was added to the compound followed by addition of 
CDCI3 until a clear solution appeared) 135.6, 133.9, 133.8, 131.7, 131.2, 126.6, 86.6, 82.1,81.9, 
72.3, 70.6, 58.5, 19.2, 19.0, 18.1; m/z (FAB) 265 [M+H] + , 264 [M] + . 
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Example 41: General procedure for dimethoxytritylation of compounds 14a-e to give 
Compounds 15a-e as shown in Scheme 2. 

4,4 , -Dimethoxytrityl chloride (DMTC1) was added in one portion to a stirred solution of 
5 compound 14a-e in anhydrous pyridine. After stirring the mixture at room temperature for 4 h, 
methanol (0.2 cm 3 ) was added and the resulting mixture was evaporated to dryness under 
reduced pressure. The residue was coevaporated with anhydrous CH3CN (2x5 cm 3 ) and 
anhydrous toluene (2x5 cm 3 ) and then dissolved in CH2CI2 (20 cm 3 , traces of acid removed by 
filtration through a short pad of basic alumina). The resulting solution was washed, 
10 successively, with saturated aqueous NaHCC>3 (2 x 10 cm 3 ) and brine (10 cm 3 ). The separated 
organic phase was dried (NaaSttO, filtered and evaporated to dryness under reduced pressure. 
The crude product obtained was purified by column chromatography [0.25- 0.50% (v/v) MeOH 
in CH2CI2. containing 0.5% Et 3 N] affording compounds 15a-e. 

1 5 Example 41a: f 1 R.3S.4R.7S V 1 -(4.4 , -Dimethoxvtrityloxvmethvn-7-h vdroxv-3-phenvl-2.5- 
dioxabicvclor2.2.11heptane [Compound 15a of Scheme 21 

Dimethoxytritylation of compound 14a (108 mg, 0.49 mmol) using DMTC1 (214 mg, 
0.63 mmol) in anhydrous pyridine (2 cm 3 ) followed by the general work-up procedure and 
column chromatography afforded compound 15a as a white solid material (180 mg, 71%); Rj 

20 0.3 1 (CH 2 Cl 2 /MeOH 98:2, v/v); <fc (CDCI3) 7.66-7.21 (14 H, m), 6.84 (4 H, d, J 8.8), 5.19 (1 H, 
s), 4.29(1 H, s),4.13 (1 H, s), 4.07(1 H, d, J 8.4), 4.01 (1 H, d, J 8.3), 3.78 (6 H, s), 3.55 (1 H, 
d, J 10.2), 3.50 (1 H, d, J 10.7), 2.73 (1 H, br s); £ (CDCI3) 158.6, 149.8, 144.9, 139.4, 136.2, 
135.9, 135.8, 130.3, 130.2, 128.5, 128.3, 128.0, 127.6, 126.9, 125.4, 123.9, 113.3, 86.4, 86.0, 
83.8, 83.4, 73.0, 71.6, 60.2, 55.3; m/z (FAB) 525 [M+H]\ 524 [M] + . 

25 

Example 41b: (lR3S > 4RJSVl-(4,4 , -DimethoxvtritvloxvmethylV3-(4-fluoro-3- 

methyIphenyl)-7-hvdroxv-2.5-dioxabicvclor2.2.11heptane TCompound 15b of Scheme 21 

Dimethoxytritylation of compound 14b (95 mg, 0.38 mmol) using DMTC1 (129 mg, 
0.42 mmol) in anhydrous pyridine (2 cm 3 ) followed by the general work-up procedure and 
30 column chromatography afforded compound 15b as a white solid material (126 mg, 61%); Rj 
0.32 (CH 2 Cl 2 /MeOH 98:2, v/v); & (CDCl 3 ) 7.53-7.15 (11 H, m), 6.97 (1 H, dd, / 8.7 and 8.9), 
6.84(4 H, d,J8.8), 5.11 (1 H, s), 4.26(1 H, d, J 3.9), 4.08 (1 H, s),4.03 (1 H, d, J 8.0), 3.95 (1 
H, d, J 8.0), 3.78 (6 H, s), 3.54 (1 H, d, J 10.5), 3.47 (1 H, d, J 10.1), 2.26 (3 H, d, J 1.5), 2.08 
(1 H, br s); & (CDCI3) 160.8 (d, J 244.1), 158.7, 144.9, 135.9, 134.7, 134.6, 130.3, 130.2, 
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130.1, 128.5, 128.4. 128.3, 128.0, 127.0, 125.2, 124.9, 124.4, 124.3, 115.2, 114.9, 113.4, 86.5, 
86.0, 83.7, 83.0, 72.9, 71.7, 60.1, 55.3, 14.8 (d, J 3.1); m/z (FAB) 556 [M] + . 

Example 41c: lR.3S.4R.7SM-(4.4'-DimethoxvtaitvloxvmethvlV7-hvdroxv-3-( l-naphthvlV 
5 2.5-dioxabicyclor2.2.11heptane fCompound 15c of Scheme 21 

Dimethoxytritylation of compound 14c (125 mg, 0.46 mmol) using DMTCI (170 mg, 
0.5 mmol) in anhydrous pyridine (2 cm 3 ) followed by the general work-up procedure and 
column chromatography afforded compound 15c as a white solid material (158 mg, 60%); Rf 
0.35 (CH 2 CI 2 /MeOH 98:2, v/v); Sh (CDCI 3 ) 7.95-7.86 (3 H, m), 7.79 (1 H, d, J 8.3), 7.58-7.41 
10 (9 H, m), 7.35-7.23 (3 H, m), 6.86 (4 H, d, J 8.8). 5.80 (1 H, s), 4.36 (1 H. s), 4.32 (1 H, d, J 

6.5) , 4.17 (1 H, d, J 8.3), 4.06 (1 H, d, J 8.0), 3.78 (6 H, s). 3.62-3.56 (2 H, m), 2.00 (1 H, d, J 

6.6) ; «5fc (CDCI3) 158.7, 144.9, 136.0, 135.9, 134.5, 133.6, 130.3, 129.8, 129.0, 128.3, 128.2, 
128.1, 127.0, 126.5, 125.9, 125.6, 123.9, 122.6, 113.4, 86.6, 85.7, 82.5, 81.7, 73.1, 72.6, 60.2, 
55.3; m/z (FAB) 575 [M+H] + , 574 [M] + . 

15 

Example 41d: (lRJS^RJSM ^^'-Dimethoxytrityloxymethvn^-hvdroxvO-fl-pvrenyl)- 
2.5-dioxabicyclor2.2.11heptane ICompound 15d of Scheme 21 

Dimethoxytritylation of the compound 14d (130 mg, 0.38 mmol) using DMTCI (140 
mg, 0.42 mmol) in anhydrous pyridine (2 cm 3 ) followed by the general work-up procedure and 

20 column chromatography afforded compound 15d as a white solid material ( 147 mg, 61%); Rf 
0.37 (CH 2 Cl 2 /MeOH 98:2, v/v); S» (CDC1 3 ) 8.46 (1 H, d, J 8.0), 8.19-8.00 (7 H, m), 7.61 (2 H, 
dd, J 1.6 and 7.4), 7.48 (4 H, d, J 8.3), 7.35 (2 H, dd, / 7.2 and 7.5), 7.25 (1 H, m), 7.15 (1 H, 
m), 6.88 (4 H, d,/9.0), 6.10(1 H, s), 4.46(1 H, s),4.43 (1 H, br s), 4.25(1 H,d,J8.1), 4.12(1 
H, d, J 8.1), 3.79 (6H, s), 3.71-3.63 (2 H, m), 2.22 (1 H, br s); & (CDC1 3 ) 158.7, 149.8, 144.9, 

25 136.1, 136.0, 135.9. 132.1, 131.4, 130.9, 130.6, 130.3, 130.2, 129.2, 129.1, 128.4, 128.3, 128.2, 
128.1, 127.5, 127.4, 127.0. 126.9, 126.2, 125.5, 125.4,124.9, 124.8, 124.7,123.8, 123.7, 121.9, 
1 13.4, 86.6, 86.1, 83.2, 82.2. 73.2, 72.4, 60.3, 55.3; m/z (FAB) 649 [M+H]\ 648 [M] + . 

Example 41e: (lR.3S.4R.7S)-l-(4.4'-Dimethoxvtritvloxvmethvl)-7-hvdroxv-3-(2.4.5- 

30 trimethvlphenvl)-2.5-dioxabicyclor2.2. 1 1 heptane [Compound 15e of Scheme 21 

Dimethoxytritylation of compound 14e (80 mg, 0.3 mmol) using DMTCI (113 mg, 0.33 
mmol) in anhydrous pyridine (2 cm 3 ) followed by the general work-up procedure and column 
chromatography afforded compound lSe as a white solid material (134 mg, 78%); Rf 0.32 
(CH 2 Cl 2 /MeOH 98:2, v/v); & (CDCI3) 7.55 (2 H, d, J 7.9), 7.45-7.42 (4 H, m), 7.32-7.21 (4 H, 
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m), 6.93 (1 H, s), 6.84 (4 H, d, J 8.2), 5.20 (1 H, s), 4.40 (1 H, s), 4.08 (1 H t s), 4.04 (1 H, d, J 
8.3), 3.95 (1 H, d, / 8.2), 3.78 (6 H, s), 3.56 (I H, d, J 10.5), 3.47 (1 H, d. J 10.2), 2.24 (3 H, s), 
2.22 (3 H, s), 2.19 (3 H, s); & (CDC1 3 ) 158.6, 145.0, 136.0, 135.7, 134.4, 134.2, 131.8, 
131.3,130.3, 130.2, 128.3, 128.0, 127.2, 126.9, 113.3, 86.4, 85.7, 82.1, 81.8, 73.0, 71.8, 60.2, 
5 55.3, 19.6, 19.3, 18.4; rn/z (FAB) 567 [M+Hf , 566 [M] + . 

Example 42: General procedure for synthesis of the phosphoramidite derivatives 16a-e as 
shown in Scheme 2. 

2-Cyanoethyl A^AT-diisopropylphosphoramidochloridite was added dropwise to a stirred 
10 solution of nucleoside 15a-e and N^AP-diisopropylethylamine (DIPEA) in anhydrous CH2CI2 at 
room temperature. After stirring the mixture at room temperature for 6 h, methanol (0.2 cm 3 ) 
was added and the resulting mixture diluted with EtOAc (20 cm 3 , containing 0.5% Et3N, v/v). 
The organic phase was washed, successively, with saturated a. NaHCC>3 (2x10 cm 3 ) and brine 
(10 cm 3 ). The separated organic phase was dried (Na2S0 4 ), filtered and evaporated to dryness 
15 under reduced pressure. The residue obtained was purified by column chromatography [25-30% 
(v/v) EtOAc in n-hexane containing 0.5% Et 3 N] to give the amidites 16a-e. 

Example 42 a: Synthesis of ( 1 R.3S.4R/7S W7-r2-Cvanoethoxv( diisopropvlamino) 

phosphinoxvl- 1 -(4.4'-dimethoxytrityloxymethvO-3-phen vl-2.5~dioxabicyclof2.2. 11 heptane 

20 FCompound 16a of Scheme 21 

Treatment of compound 15a (170 mg, 0.32 mmol) with 2-cyanoethyl NJf- 
diisopropylphosphoramidochloridite (85 mg, 0.36 mmol) in the presence of DIPEA (0.4cm 3 ) 
and anhydrous CH2CI2 (2.0 cm 3 ) followed by the general work-up procedure and column 
chromatography afforded phosphoramidite 16a as a white solid material (155 mg, 66%); R/ 

25 0.45, 0.41 (CH 2 Cl 2 /MeOH 98:2, v/v); * (CDCI3) 149.3, 148.9. 

Example 42b: ( 1R.3S.4R.7SV742-Cvanoethoxy(diisopropvlamino^ phosphinoxv1-l-(4.4 > - 
dimethoxvtritvloxvmethvl)-3-f4-fluoro-3-methvlphenv!V2,5-dioxabicvclor2.2.nheptane 
fCompound 16b of Scheme 21 
30 Treatment of compound 15b (95 mg, 0.17 mmol) with 2-cyanoethyl NJf- 

diisopropylphosphoramidochloridite (53 mg, 0.22 mmol) in the presence of DIPEA (0.3cm 3 ) 
and anhydrous CH2CI2 (2.0 cm 3 ) followed by the general work-up procedure and column 
chromatography afforded phosphoramidite 16b as a white solid materia! (85 mg, 66%); R/0A5, 
0.41 (CH 2 Cl 2 /MeOH 98:2, v/v); & (CDC1 3 ) 149.3, 148.8. 

35 
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Exam ple 42c: Synthesis of f lR3S.4RJSV7-r2-Cvanoethoxv(diisopropvlarnino)phosphinoxy1~ 
l>r4.4 t KlimethoxvtritvloxvmethvlV3-(l-naphthvlV2.5-dioxabicvclor2.2.11heptane rCompound 
16c of Scheme 21 

Treatment of compound 5c (158 mg, 0.28 mmol) with 2-cyanoethyl NJf- 
5 diisopropylphosphoramidochloridite (75.7 mg, 0.32 mmol) in the presence of DIPEA (0.4cm 3 ) 
and anhydrous CH2CI2 (2.0 cm 3 ) followed by the general work-up procedure and column 
chromatography afforded phosphoramidite 16c as a white solid material (127 mg, 60%); Rj 
0.47, 0.44 (CH 2 Cl 2 /MeOH 98:2, v/v); £ (CDCI3) 149.2, 149.1. 

10 Example 42d: Synthesis of f lR3S.4R.7SV7-r2-Cvanoethoxv(diisopropvlamino^ 

phosphinoxv1-l-f4,4 , -dimethoxytritvloxymethyl)-3-fl-pyrenvn-2.5-dioxabicvclor 

2.2.11heptane TCompound 16d of Scheme 21 

Treatment of compound lSd (140 mg, 0.22 mmol) with 2-cyanoethyl NJ>P- 

diisopropylphosphoramidochloridite (64 mg, 0.27 mmol) in the presence of DIPEA (0.3cm 3 ) 
15 and anhydrous CH2CI2 (2.0 cm 3 ) followed by the general work-up procedure and column 

chromatography afforded phosphoramidite 16d as a white solid material (124 mg, 68%); Rj 

0.51, 0.47 (CH 2 Cl2/MeOH 98:2, v/v); * (CDC1 3 ) 149.4, 149.1. 

Example 42e: Synthesis of f lR3S>4R.7SV7-r2-Cvanoethoxvrdiisopropylamino) 

20 phosphinoxvl- 1 -( 4.4' -dimethoxytrityloxy methyl)- 3 -(2,4.5 -trimethvlphenylV2. 5- 

dioxabicvclor2.2.nheptane [Compound 16e of Scheme 21 

Treatment of compound 15e (130 mg, 0.23 mmol) with 2-cyanoethyl NJST- 

diisopropylphosphoramidochloridite (64 mg, 0.27 mmol) in the presence of DIPEA (0.3cm 3 ) 

and anhydrous CH 2 CI 2 (2.0 cm 3 ) followed by the general work-up procedure and column 
25 chromatography afforded phosphoramidite 16e as a white solid material (111 mg, 63%); R/ 

0.44, 0.42 (CH 2 Cl 2 /MeOH 98:2, v/v); S? (CDCI3) 149.0. 

Example 43: Synthesis, deprotection and purification of oligonucleotides 

All oligomers were prepared using the phosphoramidite approach on a Biosearch 8750 
30 DNA synthesizer in 0.2 jimol scale on CPG solid supports (BioGenex). The stepwise coupling 
efficiencies for phosphoramidites 16a-c (10 min coupling time) and phosphoramidites 16d and 
16e (20 min coupling time) were >96% and for unmodified deoxynucleoside and 
ribonucleoside phosphoramidites (with standard coupling time) generally >99%, in all cases 
using ltf-tetrazole as activator. After standard deprotection and cleavage from the solid support 
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using 32% aqueous ammonia ( 1 2 h, 55 °C), the oligomers were purified by precipitation from 
ethanol. The composition of the oligomers were verified by MALDI-MS analysis and the purity 
(>80%) by capillary gel electrophoresis. Selected MALDI-MS data ([M-H] ; found/calcd.: ON3 
2731/2733; ON4 2857/2857; ON6 3094/3093). 

5 

Example 44. Thermal denaturation studies 

The thermal denaturation experiments were performed on a Perkin-Elmer UV/VIS 
spectrometer fitted with a PTP-6 Peltier temperature-programming element using a medium salt 
buffer solution (10 mM sodium phosphate, 100 mM sodium chloride, 0. 1 mM EDTA, pH 7.0). 
10 Concentrations of 1.5 mM of the two complementary strands were used assuming identical 
extinction coefficients for modified and unmodified oligonucleotides. The absorbance was 
monitored at 260 nm while raising the temperature at a rate of 1 °C per min. The melting 
temperatures (T m values) of the duplexes were determined as the maximum of the first 
derivatives of the melting curves obtained. 

15 

Example 45: Synthesis of compounds 16a-16e and oligomers containing monomers 17a-17e 

LNA containing the derivatives 17a-17e (Figure 1, Scheme 1, Scheme 2), were 
synthesized, all based on the LNA-type 2 , -0,4 , -C-methylene->5-D-ribofuranosyl moiety which 
is known to adopt a locked C3'-endo RNA-like furanose conformation [S. Obika, D. Nanbu, Y. 

20 Hari, K. Morio, Y. In, T. Ishida, and T. Imanishi, Tetrahedron Lett, 1997, 38, 8735; S. K. 
Singh, P. Nielsen, A. A. Koshkin and J. Wengel, Chem. Commun., 1998, 455; A. A. Koshkin, 
S. K. Singh, P. Nielsen, V. K. Rajwanshi, R. Kumar, M. Meldgaard, C E. Olsen and J. Wengel, 
Tetrahedron, 1998, 54, 3607; S. Obika, D. Nanbu, Y. Hari, J. Andoh, K. Morio, T. Doi and T. 
Imanishi, Tetrahedron Lett., 1998, 39, 5401], The syntheses of the phosphoramidite building 

25 blocks 16a 16e suitable for incorporation of the LNA-type aryl C-glycosides 17a-17e are 
shown in Scheme 1 and Scheme 2 and described in details in the experimental section. In the 
design of an appropriate synthetic route, it was decided to utilize a reaction similar to one 
described recently in the literature. Thus, stereoselective attack of Grignard reagents of various 
heterocycles on a carbonyl group of an aldehyde corresponding to aldehyde 11 (Scheme 2) but 

30 with two 0-benzyl groups instead of the two p-methoxybenzyl groups of aldehyde 11 (Scheme 
2) has been reported to furnish locked-C-nucleosides [S. Obika, Y. Hari, K. Morio and T. 
Imanishi, Tetrahedron Lett., 2000, 41, 215; S. Obika, Y. Hari, K. Morio and T. Imanishi, 
Tetrahedron Lett., 2000, 41, 221]. The key intermediate in the synthetic route selected herein, 
namely the novel aldehyde 11 was synthesized from the known furanoside 1 [R. Yamaguchi, T. 

35 Imanishi, S. Kohgo, H. Horie and H. Ohrui r Biosci. BiotechnoL Biochem. .1999, 63, 736] 
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following two different routes. In general, 0-(p-Methoxy)benzyl protection was desirable 
instead of O-benzyl protection as removal of the benzyl protection at a later stage (Le. 13 — 
>14) could also likely result in the cleavage of the benzylic O-Ct bond present, e.g., in 
compounds 13 and 14 (Scheme 2). In one route to give aldehyde 11, regioselective p- 
5 methoxybenzylation of the furanoside 1, followed by mesylation and methanolysis yielded the 
anomeric mixture of the methyl furanosides 9. Base induced cyclization followed by acetyl 
hydrolysis afforded the aldehyde 11 in approximately 24% overall yield from 1 (Scheme 1 and 
Scheme 2). This yield was improved to following a different strategy. Thus, di-0-mesylation of 
1 followed by methanolysis and base induced intramolecular nucleophilic attack from the 2-OH 

10 group afforded the cyclized anomeric mixture of methyl furanoside 4. Substitution of the 
remaining mesyloxy group of 4 with an acetate group, followed by deacetylation, p- 
methoxybenzylation and then acetyl hydrolysis afforded the required aldehyde 11 (Scheme 1). 

Coupling of the aldehyde 11 with different aryl Grignard reagents yielded selectively 
one epimer of each of the compounds 12a-e in good yields (see experimental section for further 

15 details on this and other synthetic steps). Each of the diols 12a-e was cyclized under Mitsunobu 
conditions (TMAD, PBU3) to afford the bicyclic /J-C-nucleoside derivatives 13a-e. Oxidative 
removal of the p-methoxybenzyl protections was achieved in satisfactory yields using DDQ. 
Subsequent, selective 4,4'-dimethoxytritylation ( to give compounds 15a-e) followed by 
phosphorylation afforded the phosphoramidite building blocks 16a-e in satisfactory yields. The 

20 configuration of compounds 13, and thus also compounds 11, 12 and 14-17 were assigned 
based on 1H NMR spectroscopy, including NOE experiments. 

All oligomers were prepared in the 0.2 ^imol scale using the phosphoramidite approach. 
The stepwise coupling efficiencies for phosphoramidites 16a-c (10 min coupling time) and 
phosphoramidites 16d and 16e (20 min coupling time) were >96% and for unmodified 

25 deoxy nucleoside and ribonucleoside phosphoramidites (with standard coupling time) generally 
>99%, in all cases using lf/-tetrazole as activator. After standard deprotection and cleavage 
from the solid support using 32% aqueous ammonia (12 h, 55 °C), the oligomers were purified 
by precipitation from ethanol. The composition of the oligomers were verified by MALDI-MS 
analysis and the purity (>80%) by capillary gel electrophoresis. 

30 

Example 46 Thermal denaturation studies to evaluate hybridization properties 

The hybridization of the oligonucleotides ON1-ON11 (Table 8 below) toward four 9- 
mer DNA targets with the central base being each of four natural bases were studied by thermal 
denaturation experiments (T m measurements; see the experimental section for details). 
35 Compared to the DNA reference ONI, introduction of one abasic LNA monomer Ab L (ON2) 
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has earlier been reported to prevent the formation of a stable duplex above 0 °C (only evaluated 
with adenine as the opposite base) [L. Kvaern0 and J. Wengel, Chem. Commun., 1999, 657]. 
With the phenyl monomer 17a (ON3), T m values in the range of 5-12 °C was observed. Thus, 
the phenyl moiety stabilizes the duplexes compared to Ab L , but universal hybridization is not 
5 achieved as a preference for a central adenine base in the complementary target strand is 
indicated (Table 8). In addition, significant destabilization compared to the ONl:DNA 
reference duplex was observed. Results similar to those obtained for ON3 were obtained for 
oligomers isosequential with ON3 but containing 17b t 17c or 17e instead of 17a as the central 
monomer (Table 8, ON7, ON8 and ON9, respectively). 

10 

Table 8. Thermal denaturation experiments (T m values shown) for ONI -ON 11 towards DNA 
complements with each of the four natural bases in the central position" 


DNA target: 3'-d(CACTYTACG) Y: A C G T 


15 ONI 

5 '-d(GTG ATATGC) 

28 

11 

12 

19 

ON2 

5 ' -d(GTG AAb L ATGC) 

<3 

n.d. 

n.d. 

n.d 

ON3 

5'-d(GTGA17aATGC) 

12 

5 

6 

7 

ON4 

5 ' -d(GTG Al 7d ATGC) 

18 

17 

18 

19 

ON5 

5 '-d[2' -OMe(GTGATATGC)] 

35 

14 

19 

21 

20 ON6 

5'-d[2'-OMe(GT L GA17dAT L GC)] 

39 

38 

37 

40 

ON7 

5 '-d(GTG A17b ATGC) 

15 

7 

6 

8 

ON8 

5-d(GTGA17cATGC) 

15 

7 

6 

9 

ON9 

5 ' -d(GTG A17e ATGC) 

13 

6 

6 

7 

ON10 S'-dp'-OMefGT^GAlTbAT^GC)] 

31 

25 

26 

27 

25 ON11 5'-d[2'-OMe(GT L GA17cAT L GC)] 

34 

27 

27 

32 


a Melting temperatures (T m values/°C) measured as the maximum of the first derivative of the 
melting curve (A260 us temperature) recorded in medium salt buffer (10 mM sodium phosphate, 
100 mM sodium chloride, 0.1 mM EDTA, pH 7.0) using 1.5 fiM concentrations of the two 
strands; A = adenine monomer, C = cytosine monomer, G = guanine monomer, T = thymine 

30 monomer; See Figure 1 and/or Scheme 2 for structures of T*\ Ab L and 17a-17e; DNA 
sequences are shown as d(sequence) and 2'-OMe-RNA sequences as 2'-OMe(sequence); "n.d." 
denotes "not determined". The data reported for ONI have been reported earlier [A. A. 
Koshkin, S. K. Singh, P. Nielsen, V. K. Rajwanshi, R. Kumar, M. Meldgaard, C. E. Olsen and 
J. Wengel, Tetrahedron, 1998, 54, 3607]. The data reported for ON2 has been reported earlier 

35 [L. Kvgrn0 and J. Wengel, Chem. Commun^ 1999, 657]. 

The pyrene LNA nucleotide 17d (in ON4) displays more encouraging properties (Table 
8). Firstly, the binding affinity towards all four complements is increased compared to ON3 
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(containing 17a). Secondly, universal hybridization is observed as shown by the four T m values 
all being within 17-19 °C. With respect to universal hybridization, 17d thus parallels the pyrene 
DNA derivative Py [T. J. Matray and E. T. Kool, /. Am. Chem. Soc, 1998, 120, 6191], but the 
decrease in thermal stability compared to the ONl:DNA reference is more pronounced for 17d 
5 (-10 °C) than reported for Py (-5 °C in a 12-mer polypyrimidine DNA sequence) [T. J. Matray 
and E. T. Kool, J. Am. Chem. Soc, 1998, 120, 6191]. It therefore appears that stacking (or 
intercalation) by theO pyrene moiety is not favored by the conformational restriction of the 
furanose ring of 17d, although comparison of the thermal stabilities of ON2, ON3 and ON4 
strongly indicate interaction of the pyrene moiety within the helix. 
10 When measured against an RNA target [3'-r(CACUAUACG)], the T m values (using 

identical experimental conditions as for the experiments descried above) of ON3 was 11.9 °C 
and of ON4 was 12.7 °C. For oligomers ON7, ON8 and ON9 (Table 8), the corresponding T m 
values were 1 1.7, 8.8 and 10.2 °C, respectively. 

15 Example 47: The effect of pyrene LNA monomers in an RNA-like strand. 

ON5, ON6, ON10 and ON11 (see Table 8 above), were synthesized. The former being 
composed entirely of 2'-OMe-RNA monomers and the latter three of six 2 , -OMe-RNA 
monomers (see Figure 1), two LNA thymine monomers T L (see Figure 1), and one central LNA 
pyrene monomer 17d (oligomer ON6), or one central monomer 17b (ON 10) or 17c (ON11). A 

20 sequence corresponding to ON6 but with three T L monomers has earlier been shown to form a 
duplex with complementary DNA of very high thermal stability. ON6 is therefore suitable for 
evaluation of the effect of introducing high-affinity monomers around a universal base. As seen 
in Table 8, the 2'-OMe-RNA reference ON5 binds to the DNA complement with slightly 
increased thermal stability and conserved Watson-Crick discrimination (compared to the DNA 

25 reference ONI). Indeed, the LNA/2'-OMe-RNA chimera ON6 displays universal hybridization 
behavior as revealed from the four T m values (37, 38, 39 and 40 °C). AH four T m values 
obtained for ON6 are higher than the Tm values obtained for the two fully complementary 
reference duplexes ONl:DNA (T m = 28 °C) and ONS:DNA (T m = 35 °C). 

These novel data demonstrate that the pyrene LNA monomer 17d display universal 

30 hybridization behavior both in a DNA context (ON4) and in an RNA-like context (ON6), and 
that the problem of decreased affinity of universal hybridization probes can be solved by the 
introduction of high-affinity monomers, e.g. 2'-OMe-RNA and/or LNA monomers. Increased 
affinities compared to ON7 and ON8 were obtained for ON10 and ON11, respectively, but 
universal hybridization behavior was not obtained as a preference for a central adenine base in 

35 the complementary target strand is indicated (Table 8 above). 
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Example 48: Base-pairing selectivity in hybridization probes. 

A systematic thermal denaturation study with ON6 (Table 11) was performed to 

determine base-pairing selectivity. For each of the four DNA complements (DNA target 
5 strands; monomer Y = A, C, G or T) used in the study shown in Table 8 above, ON6, 

containing a central pyrene LNA monomer 17d, was hybridized with all four base combinations 

in the neighboring position towards the 3'-end of ON6 (DNA target strands; monomer Z = A, 

C, G or T, monomer X = T) and the same towards the 5' -end of ON6 (DNA target strands; 

monomer X = A, C t G or T, monomer Z = T). In all eight subsets of four data points, 
10 satisfactory to excellent Watson -Crick discrimination was observed between the match and the 

three mismatches (Table 1 1 below, AT m values in the range of 5-25 °C). 


Table 11. Thermal denaturation experiments (T m values shown) to evaluate the base-pairing 
selectivity of the bases neighboring the universal pyrene LNA monomer 17d in the 2'-OMe- 
15 RNA/LNA chimera ON6. In the target strand [3'-d(CAC-XYZ-ACG)], the central three bases 
XYZ are varied among each of the four natural bases' 1 

_ __ OMe(GT L G _ A17dA jt l GC)1 ~~ 
3'-d(CAC -XYZ-ACG) 


XYZ 

TJ°C 

XYZ 

TJ°C 

XYZ 

T^C 

XYZ 

TJ°C 

TAA 

26 

TCA 

22 

TGA 

22 

TTA 

29 

TAC 

26 

TCC 

29 

TGC 

26 

TTG 

31 

TAG 

24 

TCG 

24 

TGG 

30 

TTC 

32 

25 TAT 

39 

TCT 

38 

TGT 

37 

TTT 

40 

AAT 

18 

ACT 

27 

AGT 

22 

ATT 

28 

CAT 

30 

CCT 

31 

CGT 

27 

CTT 

35 

GAT 

14 

GCT 

28 

GGT 

16 

GTT 

27 

TAT 

39 

TCT 

38 

TGT 

37 

TTT 

40 


30 

a See caption below Table 8 for abbreviations and conditions used; The data for matched 
neighboring bases (X = Z = T) are shown in bold. 

The results reported herein have several important implications for the design of probes 
35 for universal hybridization: (1) Universal hybridization is possible with a conformationally 
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restricted monomer as demonstrated for the pyrene LNA monomer; (2) Universal hybridization 
behavior is feasible in an RNA context; (3) The binding affinity of probes for universal 
hybridization can be increased by the introduction of high-affinity monomers without 
compromising the universal hybridization and the base-pairing selectivity of bases neighboring 
5 the universal base. 

Based on the results reported herein, that chimeric oligonucleotides comprising pyrene 
and other known universal bases attached at various backbones (e.g. LNA-type monomers, 
ribofuranose monomers or deoxyribose monomers in 2'-OMe-RNA/LNA chimeric oligos) 
likewise will display attractive properties with respect to universal hybridization behavior. For 
10 example, an oligomer identical with the 2'-OMe-RNA/LNA oligo ON6 but with the 17d 
monomer substituted by a pyrenyl-2'-OMe-ribonucleotide monomer. 

Example 49: Chimeric oligonucleotides 

These chimeric oligonucleotides are comprised of pyrene and other known universal 
15 bases attached at various backbones (e.g. LNA-type monomers, ribofuranose monomers or 
deoxyribose monomers in 2'-OMe-RNA/LNA oligos). Experimentation with these chimeric 
oligonucleotides are for evaluating the possibility of obtaining similar results to the 2'-OMe- 
RNA/LNA oligo ON6 at a lower cost, for example, by substituting Py L with a pyrenyl-2'-OMe- 
ribonucleotide monomer. 

20 

Example 50: The use of LNA Oligonucleotide Microarrays Provides Superior Sensitivity and 

Specificity in Expression Profiling 

A, In vitro synthesis of the yeast spike RNAs 

Amplification of the yeast genes was carried out by standard two-step PCR using yeast 
25 genomic DNA as template (21). In the second PCR, a poly-T 2 o tail was inserted in the 
amplicon. The DNA fragments were ligated into the pTRIampl8 vector (Ambion, USA) using 
the Quick Ligation Kit (New England Biolabs, USA) according to the manufacturers 
instructions and transformed into E. coli DH-5a (21). Synthesis of in vitro RNA was done 
using the MEGAscript™ T7 Kit (Ambion, USA) according to the manufacturers instructions. 

30 B. Design of the LNA expression arrays 

Capture probes were designed using the OligoDesign™ software as described in the 
previous examples. 

C. Printing and hybridisation of the LNA expression microarrays 
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The LNA oligonucleotide microarrays were printed onto Immobilizer™ MicroArray 
Slides (Exiqon, Denmark) using the Packard Biochip I Arrayer (Packard, USA), with a spot 
volume of 2 x 300 pi of a 10 DM capture probe solution. Four replicas of each capture probe 
were printed on each slide. Mixed staged Caenorhabditis elegans worm cultures were 
5 cultivated according to standard protocols. RNA was extracted from worm samples using the 
FastRNA Kit, GREEN (Q-BIO, USA) according to the supplier's instructions. Fluorochrome- 
labelled first strand cDN A was synthesized from worm total RNA or in vitro synthesized RNA 
as described (22) followed by purification of the cDNA target, hybridisation of the microarrays 
overnight at 65°C t washing of the slides and drying of the arrays (22). The slides were scanned 
10 using a ScanArray 4000 XL scanner (Perkin-Elmer, USA), and the array data were processed 
using the GenePix™ Pro 4.0 software package (Axon, USA). 

D. Assessment of sensitivity and specificity in LNA expression microarrays 

To enable direct comparisons between LNA and DNA capture probes in measuring gene 

15 expression levels, specific oligonucleotide capture probes for the Saccharomyces cerevisiae 
genes SWI5 and THI4 were designed in the 3'-end of the two ORFs. The capture probes were 
synthesized as 50-mer DNA and corresponding LNA-modified oligonucleotides, respectively, 
with an LNA substitution at every 3rd nucleotide position. In addition, 40-mer DNA and LNA 
oligonucleotides were designed as truncated versions of the 50-mer capture probes, along with 

20 oligonucleotides with 1 to 5 consecutive mismatches positioned centrally in the 50-mer and 40- 
mer capture probes. All capture probes were synthesized with an anthraquinone group at the 5'- 
end and a hexaethyleneglycol dimer linker region (HEG2 spacer arm) enabling photocoupling 
onto polymer microarray slides as described in US patent No. 6,531,591. 

To assess the sensitivity and specificity of the oligonucleotide microarrays, in vitro 

25 synthesized yeast RNA for either SWI5 or THI4 was spiked into C. elegans total RNA for 
cDNA target synthesis followed by hybridization of the microarrays with fluorochrome-labelled 
cDNA target pools. The incorporation of LNA nucleotides into 50-mer DNA oligonucleotide 
capture probes results in a 3 to 4-fold increase in fluorescence intensity levels, when hybridized 
with the spiked, complex cDNA target pools under standard stringency conditions (Figure 45a 

30* and c). The sensitivity increase is even more pronounced, 5 to 12-fold, when 40-mer LNA 
capture probes are employed. None of the yeast capture probes showed cross-hybridization to 
C elegans cDNA target control without yeast spike RNA under the same conditions. 

The specificity of the oligonucleotide capture probes was examined using a panel of 
LNA mismatch oligonucleotides together with the DNA controls. As demonstrated in Figure 45 

35 a and c, the fluorescence intensities obtained with the LNA-modified 40-mer triple mismatch 
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oligonucleotides show a 3-fold intensity decrease relative to the perfectly matched duplexes. In 
contrast, the corresponding 40-mer standard DNA capture probes are neither capable of 
forming stable duplexes nor discriminating between the perfect match and mismatched targets 
under standard hybridization stringency conditions, resulting in low intensity values from all 
5 40-mer DNA capture probes (Figure 45 a and c). Interestingly, mismatch discrimination with 
the 50-mer LNA probes could be significantly improved by increasing the hybridization 
temperature from 65 °C to 70 °C (Figure 45 b and d), without compromising their capture 
sensitivity. By comparison, the signal intensities from all 50-mer DNA capture probes including 
the perfect match oligonucleotides were reduced under the same conditions (Figure 45 b and d). 
10 Considered together, our results strongly support the contention that LNA oligonucleotide 
capture probes are significantly more sensitive and specific than DNA probes, being able to 
discriminate between highly homologous (90 %) mRNAs with a 5 to 10-fold increase in 
sensitivity. 

In a typical cell, mRNAs can be subdivided into three kinetic classes: (i) highly 

15 abundant (30-90 % of the total mRNA mass, 0.1 % of the sequence complexity); (ii) medium 
abundant (50 % mass, 2-5 % of complexity); and (iii) low-abundant mRNAs (<1% mass, >90 
% of complexity). In addition, alternative splicing has been shown to be prevalent in higher 
eukaryotes, where at least 50 % of the genes appear to be alternatively spliced, thereby 
generating additional diversity within the transcriptome. It is thus of utmost importance that the 

20 dynamic range, sensitivity and specificity of the expression profiling technology used are 
optimal, especially when analyzing expression levels of messages and mRNA splice variants 
belonging to the low-abundant class of high mRNA sequence complexity. A common problem 
for all DNA oligonucleotide microarrays is the need for an adequate compromise with respect 
to the sensitivity and specificity of the platform. In the present example LNA oligonucleotide 

25 microarrays perform better in expression profiling than microarrays with corresponding DNA 
probes. Our results clearly demonstrate that both the specificity and sensitivity in target 
molecule capture can be improved using LNA oligonucleotide microarrays, enabling 
discrimination between highly homologous mRNAs and alternative splice variants with a 
simultaneous increase in sensitivity. 

30 Figure 45 shows the sensitivity and specificity of LNA oligonucleotide capture probes (black 
bars) compared to DNA capture probes (white bars) on expression microarrays. Fluorescence 
intensity is shown in arbitrary units (relative measurements). The arrays comprising 50-mer and 
40-mer perfect match and 1-5 mismatch capture probes were hybridized at 65°C in 3xSSC with 
Cy3-Iabelled cDNA from 10 (Xg C. elegans total RNA spiked with yeast a) SWI5 RNA and c) 

35 THI4 RNA. Demonstration of improved mismatch discrimination with the 50-mer LNA probes 
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by increasing the hybridization temperature from 65 °C to 70 °C hybridized with Cy3-labelled 
cDN A from 10 ^g C. elegans total RNA spiked with yeast b) SWI5 RNA and d) THI4 RNA. 


Example 51. Improved sensitivity in the on-chip capture of yeast HSP78 mRNA using LNA- 
5 substituted 25-mer oligonucleotide capture probes. 

A. Capture probe design 

Unique capture probes for the yeast HSP78 gene were designed using the OligoDesign 
software, described in Figure 27. The design options used were: (i) length of each 
oligonucleotide probe was 25 nucleotides; (ii) Blast word length 7; (iii) Blast expectation cut- 

10 off 1000; other options were as default; 24 DNA capture probes were selected. Furthermore, 
three different LNA-substituted probes were designed based on the sequences of the 24 DNA 
capture probes, selected by OligoDesign: optimal LNA_T, optimal LNA_TC and LNA_3. In 
the LNA_T design the DNA T nucleotides were substituted with LNA T. For the LNA_TC 
design, LNA T and C nucleotides were used to substitute DNA t and c. In the LNA_3 design 

15 every third DNA nucleotide was substituted with the corresponding LNA nucleotide. For the 
LNA__T and LNA_TC design, no blocks of LN As were allowed; in addition the LNAs were 
substituted in a pattern providing a more narrow duplex melting temperature range compared to 
the DNA Tm range. In addition, an equivalent set of capture probes with a single mismatch in 
the central nucleotide position was designed. Altogether, 192 capture probes were designed 

20 including a anthraquinone (AQ29) 5*-modifier and a hexaethyleneglycol dimer (HEG2) at the 
5'end of each probe - as shown in Table 13. 

B. Determination of the duplex melting temperatures (Tm). 

The duplex melting temperatures of the DNA, LNA T and LNA TC designed probes were 
measured using the Perkin Elmer Lambda 40 Spectrophotometer and according to Wahlestedt 
25 et al. PNAS 97/10 2000. All oligos were measured twice and if the replica values deviated more 
than 1°C, then a third or a fourth measurement was carried out. The average Tm for each 
oligonucleotide duplex is presented in Table 14. 

C. in vitro synthesis of fluorochrome-labeled yeast HSP78 RNA 

CI. Genomic DNA was prepared from a wild type standard laboratory strain of Saccharomyces 
30 cerevisiae using the Nucleon Mi Y DNA extraction kit (Amersham Biosciences) according to 
supplier's instructions. 

C2. PCR amplification of the partial yeast gene was done by standard PCR using yeast 
genomic DNA as template. In the first step of amplification, a forward primer containing a 
restriction enzyme site and a reverse primer containing a universal linker sequence were used. 
35 In this step 20 bp was added to the 3 '-end of the amplicon, next to the stop codon. In the second 
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step of amplification, the reverse primer was exchanged with a nested primer containing a poly- 
T 2 o tail and a restriction enzyme site. The SWL5 amplicon contains 730 bp of the SWI5 ORF 
plus 20 bp universal linker sequence and a poly-A2o tail. 
The PCR primers used were: 
5 YDR258C-For-SacI: acgtgagctcttttgacatgtcagaatttcaag 
YDR258C-Rev-Uni: gatccccgggaattgccatgttacttttcagcttcctcttcaac 
Uni-polyT-BamHI: acgtggatccttttttttttttttttttttgatccccgggaattgccatg, 
C3. Plasmid DNA constructs 

The PCR amplicon was cut with the restriction enzymes, EcoRl + BamHl. The DNA fragment 
10 was ligated into thepTRIampl8 vector (Ambion) using the Quick Ligation Kit (New England 
Biolabs) according to the supplier's instructions and transformed into E. coli DH-5a by 
standard methods. 
C4. DNA sequencing 

To verify the cloning of the PCR amplicon, plasmid DNA was sequenced using M13 forward 
15 and M13 reverse primers and analysed on an ABI 377. 
C5. Biotin labeling of cRNA 

One jig of plasmid containing the HSP78 sequence was linearized with restriction enzyme 
BamHI (Amersham Pharmacia Biotech, USA) for 2 hours at 37 °C. The RNA was labeled with 
biotin-CTP and biotin-UTP using the Message AMP aRNA kit from Ambion (USA) according 
20 to the manufacturer's instructions. Following hybridisation, the slides were stained with 
Streptavidin Phycoerythrin (Molecular Probes, S-866, USA) according to the GeneChip 
Expression Analysis Technical Manual (Affymetrix, USA) 
C6. Fluorochrome labeling of spike RNA 

In vitro synthesized spike RNA from the HSP78 plasmid construct was labeled with either Cy3- 
25 ULS or Cy5-ULS (Amersham Biosciences, USA) according to the manufacturer's instructions, 

followed by filtration through a ProbeQuant G50 Micro Column and Microcon30 (Millipore, 

USA). The labeling efficiency was monitored using the Nanodrop spectrophotometer 

(Nanodrop Technologies, USA) 

D. Microarray fabrication 
30 The microarrays were printed on Immobilizer™ MicroArray Slides (Exiqon, Denmark) using 

the MicroGrid II from Biorobotics (UK) using a 20 [iM capture probe solution for each 

oligonucleotide probe. Four replicas of each capture probe were printed on the slides. 

. E. Hybridization with fluorochrome-labelled cRN A 
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The arrays were hybridized for 16 hours using the following protocol. The labelled RNA 
samples were hybridized in a hybridization solution (20 nL final volume) containing 3xSSC 
(final concentration), 25 mM HEPES, pH 7.0 (final concentration), 1.25 Jlg/jiL yeast tRNA t 
0.3% SDS. The labeled RNA target sample was filtered in a Millipore 0.22 micron spin column 
5 according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by 
incubating the reaction at 100°C for 2 min. The sample was cooled at 20-25°C for 5 min. by 
spinning at max speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was 
carefully placed on top of the microarray spotted on Immobilizer™ MicroArray Slide and the 
hybridization mixture was applied to the array from the side. An aliquot of 30 \\L of 3xSSC was 

10 added to both ends of the hybridization chamber, and the Immobilizer™ MicroArray Slide was 
placed in the hybridization chamber. The chamber was sealed watertight and incubated at 45°C, 
55°C or 65°C for 16-18 hours submerged in a water bath. After hybridisation, the slide was 
removed carefully from the hybridization chamber and washed using the following protocol. 
The Lifterslip coverslip was washed off in 6xSSC, pH 7.0 containing 0.1% Tween20 at 50°C 

15 for 15 min., followed by washing of the microarrays in 0.4xSSC, pH 7.0 at 50°C for 30 min. 
Finally the slides were washed for 5 seconds in 0.05XSSC, pH 7.0. The slides were then dried 
by centrifugation in a swinging bucket rotor at approximately 200 G for 2 min. 

F. Data analysis. 

Following washing and drying, the slides were scanned using a ScanArray 4000XL scanner 
20 (Perkin-Elmer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 
4.0 software package (Axon, USA). 

G. RESULTS 

Gl. Duplex melting temperatures 

The Tra data clearly shows that LNA-substituted oligonucleotide capture probes have a 
25 significantly increased average duplex melting temperature compared to the corresponding 
DNA probes. Furthermore, the difference in melting temperature between the perfectly 
matched (PM) and single mismatched (MM) probes, designated as ATm, is significantly higher 
than the corresponding ATm for DNA probes (Table 12). 
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Table 12. The average difference in melting temperature between the perfectly matched (PM) 
and single mismatched (MM) probes in different capture probe designs. The observed 
difference between the DNA and LNA substituted probes is statistically significant as revealed 
5 by a t-Test; Two-Sample Assuming Unequal Variances. 



Average 

Max 

Min 

A Tm 

St.dev. 

T-Test 

MM-DNA 

60.4 

68.5 

54.3 




MM- LNA T 

67.6 

74 

58.1 




MM-LNA TC 

72.0 

79.3 

61.6 




PM-DNA 

66.3 

72.8 

61 

5.9 

1.52 


PM-LNA T 

74.2 

80.8 

66.3 

6.6 

1.32 

0.047 

PM-LNA TC 

80.0 

86.8 

71.4 

8.0 

2.65 

0.001/0.017 


G2. Microarray hybridization results 

Both LNA_T and LNA_3 substituted 25-mer probes are capable of providing highly accurate 
measurements for fold-of-changes in gene expression levels, as depicted in Figure 46. The 

10 DNA capture probes did not provide any hybridisation signals under the given microarray 
hybridisation conditions (Fig. 46). Figure 46 shows the expected (black bars) and observed 
(white bars) fold-of-change in the expression levels of the Cy3-ULS labelled HSP78 spike 
RNA as measured by on-chip capture using different oligonucleotide capture probes. In the 
hybridization experiment, 1 ng HSP78 in vitro spike RNA or 200 pg HSP78 in vitro spike RNA 

15 was used, respectively. Thus, the fold change of the HSP78 RNA in the two hybridizations in 
the comparison is 5-fold. Fourteen additional synthetic in vitro mRNA spike controls were 
included in the hybridisation solution as a semi-complex background RNA mixture. Seven of 
these spikes were used as normalization controls, the other seven were used as negative 
controls. Hybridization temperature was 65°C for 16 hours, and post-hybridization washes as 

20 described above. Under these conditions the DNA capture probes didnot produce hybridization 
signals. Figure 47 shows measured intensity levels by on-chip capture using three different 25- 
mer oligonucleotide capture probe designs. One (1) ng biotin-labeled HSP78 target was used in 
the hybridization experiments, followed by staining with Streptavidin Phycoerythrin. The 
LNA_T and LNA_TC substituted 25-mer capture probes show a significantly enhanced on-chip 

25 capture of the HSP78 RNA target, compared to the DNA 25-mer control probes under four 
different hybridization stringency conditions. 
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Table 13. Design of the yeast HSP78 capture probes. YDR258C denotes the ORF name of the 
S. cerevisiae HSP78 gene. The numbers refer to the nucleotide position from the 3 '-end of the 
HSP78 mRNA sequence. PM = perfectly matched probe, MM = single mismatch probe, LNA 
substitutions are depicted by capital letters, m C denotes LNA methyl-C 


Oligo name 

Sequence 

Oligo name 


Sequence 

VHD1 COP DM f\A 1 

ut ggt agcacgacaagcitagt a t 



TTTggTagcacgacaagcTTagTat , 

Y DK25tH-._rM_.U7o 

cactctaacagt ttcgccgt ttcta 

VnDTCOP d\j A7QT 


cac i c i aacagi ■ icgccg i i i c i a 

YDK258C_PM_ 1 24 

gugcc a i ggagt tcaaaatctg tc 

Y DK23oL_PiV1_ 1 24 1 


g iTgccaTgg a gTtcaaaaTcTgTc 

Y UKZ58C_rM_ J 64 

tcaatggccttgcaccatataaug 

vnoocQp OKA t <./<T" 
T UK2 joL^rM. 1 04 1 


i caa i ggcc i i gcacca i a i aa i i g 

Y D R25 8C_PM_20 1 

tcagttagccaat ccttcgcticat 

YDR258C_PM_20 1 1 


TcagTTagccaaTccTTcgcTTcat 

YDR258C_PM__249 

Utitcggccaaacgatcttgaait 

YDR258C__PM_249T 


Ti l I icggccaaacgaTc I I gaaTt 

YDR258C_JPM_295 

attgacctcaaaactttcttggata 

YDK258C_PM_Zy5 1 


aTTgaccTc aaaa cTTTcTTgg aTa 

YDR258C_PM_356 

gatgaactcaggtggataggatctt 

YDR258C_PM_35o 1 


gaTgaacTcaggTggaTag gaTcTt 

YDR258C_PM_424 

accaicatcacccaacmgtgtcg 

YDR258C_PM_424T 


accaTcaTcacccaac I " I " 1 gTgTcg 

YDR258C_PM_433 

cccaacttcgtgtcgcttaaiaaaa 

YDK258C_PM_4J3 1 


cccaac 1 1 Iglglcgl I I aa 1 aaaa 

YDR258C_PM_486 

caatgatcgtgttacggaaatcaac 

YDR258C_PM__48o I 


caaTgaTcgTgtTacggaaaTcaac 

YDR258C_PM_5 1 5 

tggcctagggaatcggtcagcttac 

YDR258C_PM_5 1 5T 


TggccTagggaaTcggTcagcTTac 

YDR258C_PM_566 

ttggaaacatcggggtgcgcttut 

YDR258C_PM_566T 


TTgg aa ac aTcggggTgcgcTTTTl 

YDR258C_PM_569 

gaaacatcggggtgcgc ttittcaa 

YDR258C_PM_569T 


gaaacaTcggggTgcgc I I r 1 1 caa 

I ut\ZjOv. r XVI oi>*f 

a a o f o a n a ft <■» a t o o o tt t* 1 1 f r> f r e* 

amine go t. a g u <ii oag^L in c nc 




YLJK23eC_rM__o3 1 

gacagccccagttaaltggccacca 

YL/K25c$C_rIvl_OJ 1 1 


gacagccTcagtTaa l Tggccacca 

Y DR258C_PM_686 

ccgattaaacgagagacagtatgct 

YDR258C_PM_ooo I 


ccgaTTaaacgagagacagTaTgct 

YDR258C_PM_757 

atcaaat aggaat tcagctaaagcc 

YDR258C_PM_757T 


aTcaaaTaggaa I " 1 cagcTaa agee 

YDR258C_PM_8I3 

gacct aagaacat aaagccggcaat 

YDR258C_PM_8I3T 


gaccTaagaacaTaaagcTggcaat 

YDR258C_PM_823 

aiaaagctggcaataggtctctltt 

YDR258C_PM_823T 


aTaaagcTggcaaTaggTcTcTTTt 

YDR258C_PM_870 

agacgtacagcaicagaaatagcag 

YDR258C.PM.870T 


agacgTacagcaTcagaaaTagcag 

YDR258C_PM_888 

aaatagcagcaat ggcctcgtctig 

YDR258C_PM„888T 


aaaTa gcagcaaTggccTcgTcTTg 

YDR258C_PM_890 

atagcagcaatggcctcgtcttggc 

YDR258C_PM_890T 


aTagcagcaaTggccTcgTcTTggc 

YDR258C_PM_896 

gcaaiggcctcgicitggccaacga 

YDR258C_PM_896T 


gcaaTggccTcgTcTTggccaacga 

YDR258C_PM_043 

TtTggTagmCamCgamCaagmCtT 




l L, 

agTai 

YDR258C_PM_LNA3. 

.043 

TStfm A rr»-> Ann A #-•»» A r»r»TTt if*, tat 

1 1 tog i Age A eg /\ca/\gc i taoidL 

YDR258CJ>M_078 

mCamCimCtaamCagtTlcgmCcgT 




TC 

tTmCTa 

YDR258C_PM_LNA3^078 

mCacTctAacAgiTtcGccGtiTcta 

YDR258C_PM_124 

gTt gmC mCaTggagTt rnCaaaaTm 




TC 

CtgTc 

YDR258C_PM^LNA3_ 

.124 

GttGccAtgGagTtcAaaAicTgic 

YDR258C_PM_J64 

TmCaaTggmCcTTgmCac mCaTa 




TC 

TaaTTg 

YDR258C_PM_LNA3_ 

.164 

TcaAtgGccTtgmCacmCaiAiaAttg 

YDR258C_PM_201 

TmCagTtagc mCaaTcmCiTmCgm 




TC 

CitmCat 

YDR258C_PM_LNA3_424 

Acc Ate AtcAcc mCaa mCttTgtGicg 

YDR258C_PMJ249 

TtTtTmCgg mCmCaaa mCgat mCl 




TC 

TgaaTt 

YDR258C_PM_LNA3_486 

mCaaTgaTcgTgtTacGgaAatmCaac 
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YDR258C_PM_295 aTTgamCmCTmCaaaamCTTtmC 

TC TTggaTa 

YDR258C_PM_356 gaTgaamCTmCaggTggaTaggaT 

TC mCTi 

YDR258C_PM_424 amCcaTcaTcacmCmCaaccTigTgt 

TC mCg 

YDR258C_PM_433 mCmCmCaacTTtgTgTcgTTTaaT 

TC aaaa 

YDR258C_PM_486 mCaaTgaTmCgTgtTamCggaaaT 

TC mCaac 

YDR258C_PM_5 1 5 TggmCcTagggaaTmCggTcagmCt 

TC Tac 

YDR258C_PM_566 TTggaaamCatmCggggTgmCgctT 

TC tTt 

Y DR258C_PM_569 gaaamCaTmCggggTgmCgcttiTim 
TC Caa 

YDR258C_PM_604 aaaamCgamCagmCaTaaggmCTt 

TC TmCtTc 

YDR258CJPM_63 ! gamCagmCcTcagtTaaTTggcmCa 

TC cmCa 

YDR258C_PM_686 mCmCgaTTaaamCgagagamCagT 

TC aTgmCt 

YDR258CJPMJ757 aTmCaaaTaggaaTtmCagmCTaaa 

TC gmCc 

YDR258C_PM_8 1 3 gamCmCTaagaamCaTaaagmCTg 

TC gmCaai 

YDR258C_PM_823 aTaaagmCTggmCaaTaggTmCTc 

TC TTTt 

Y DR258C_PM_870 agamCgTamCagmCaTmCagaaaT 
TC agmCag 

YDR258C_PM_888 aaaTagmCagmCaaTggmCcTmCg 

TC TctTg 

YDR258C_PM_890 aTagmCagcaaTggmCcTcgimCtT 

TC ggc 
YDR258C_PM_896 

TC gcaatggcctmCgTmCitggccaacga 
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YDR258C_PM_LN A3_5 I 5 TggnCci AggGaaTcgGtcAgcTtac 

YDR258C_PM_LNA3_566 TtgGaaAcaTcgGggTgcGctTm 

YDR258C_J>MJ-NA3_569 GaaAcaTcgCggTgcGctTuTcaa 

YDR258C_PM_LNA3_604 AaaAcgAcaGcaTaaGgcTitmCttc 

YDR258CJPM_LNA3_757 AtcAaaTagGaaTicAgcTaaAgcc 

YDR 258C_PM_LN A3_8 1 3 GacmCiaAga AcaTaaAgcTggmCaat 

YDR258C_PM_LNA3_823 AtaAagmCigGcaAiaGgtmCtcTiti 

Y DR258C_PM_LN A3.870 AgamCgt AcaGcaTcaGaa AtaGcag 

AaaTagrnCagmCaaTggrnCctmCglmCt 

YDR258C__PM_LNA3__888 Ig 

YDR258C_PM_LNA3_890 AtaGcaGcaAtgGccTcgTctTggc 

YDR258C_PM_LNA3_896 GcaAtgGccTcgTctTggmCcaAcga 

YDR258C_PM_LN A3_63 1 GacAgcmCtc AgtTaaTtgGccAcca 

YDR258Q_PM_LNA3j586 mCcgAttAaamCgaGagAcaGtaTgcr 

YDR258CJPM_LNA3_356 GatGaamCtcAggTggAtaGgaTctt 

YDR258C_PM_LNA3_201 TcaGttAgcmCaaTccTtcGctTcat 

YDR258C_PM_LNA3_249 TttTtcGgcmCaaAcgAtcTtgAart 

YDR258C_PM_LNA3_295 AttGacmCtcAaaActTtcTtgGata 

YDR258C_PM_LNA3_433 mCccAacTttGtgTcgTttAatAaaa 


YDR258Q.MM 
YDR258C_MM. 
YDR258C.MM. 
YDR258C_MM 
YDR258C.MM. 
YDR258C_MM. 
YDR258C.MM 


_043 tttggtagcacgtcaagcttagiat 
_078 cactctaacagtatcgccgmcia 
_ 1 24 gttgccatggagatcaaaatctgic 
.164 tcaatggccttggaccatataattg 
_20l tcagtiagccaaaccttcgcttcat 
.249 tttttcggccaatcgatcttgaatt 
.295 atlgacctcaaatctttcttggata 


YDR258C_MM.043T TTTggTagcacgicaagcTTagTat 

YDR258C_MM_078T cacTcTaacagtatcgccgTTTcTa 

YDR258C_MM_124T gTTgccaTggagalcaaaaTcTgTc 

YDR258C_MM_164T TcaaTggccTTggaccaTaTaaTTg 

Y DR258C.M M_20 1 T TcagTTagccaaaccTTcgcTTcal 

YDR258C_MM_249T TlTTTcggccaatcgaTcTTgaaTt 

YDR25 8C_M M_295T aTTgaccTcaaalcTTTcTTggaTa 
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YDR258C_MM_356 

gatgaactcaggaggataggatclt 

YDR258C__MM_356T 

gaTgaacTcaggaggaTaggaTcTt 

YDR258C_MM_424 

accaicatcaccgaactttgtgtcg 

YDR258C_MM_424T 

accaTcaTcaccgaacTTTgTgTcg 

YDR258C_MM_433 

cccaacttigtgacgtnaataaaa 

YDR258C_MM_433T 

cccaacTTTgTgacgTTTaaTaaaa 

YDR258C_MM_486 

caatgatcglgiaacggaaaicaac 

YDR258Q_MM_486T 

caaTgaTcgTgtaacggaaaTcaac 

YDR258C_MM_515 

tggcctagggaaacggtcagcttac 

YDR258C_MM_5I5T 

TggccTagggaaacggTcagcTTac 

YDR258C_MM_566 

ttggaaacaicgcggigcgcuut 

YDR258C_MM_566T 

TTggaaacaTcgcggTgcgcT ill t 

YDR258C_MM_569 

gaaacatcggggagcgcttutcaa 

YDR258C_MM_569T 

gaaacaTcggggagcgcTTTTTcaa 

YDR2S8C_MM_604 

aaaacgacagcaaaaggci ucttc 

YDR258C_MM_604T 

aaaacgacagcaaaaggcTTTcTTc 

YDR258C_MM_63l 

gacagcctcagtaaattggccacca 

YDR258C_MM_631T 

gacagccTcagtaaaTTggccacca 

YDR258C_MM_686 

ccgattaaacgacagacagtatgci 

YDR258C_MM_686T 

ccgaTTaaacgacagacagTaTgct 

YDR258C_MM_757 

atcaaataggaaatcagctaaagcc 

YDR258CJWM__757T 

aTcaaaTaggaaatcagcTaaagcc 

YDR258C_MM_8I3 

gacctaagaacaaaaagctggcaat 

YDR258C_MM_813T 

gaccTaagaacaaaaagcTggcaat 

YDR258C_MM_823 

ataaagctggcattaggtcictttt 

YDR258C_MM_823T 

aTaaagcTggcaaaaggTcTcTTTt ! 

YDR258C_MM_870 

agacgtacagcaacagaaaiagcag 

YDR258CJWM_870T 

agacgTacagcaacagaaaTagcag 

YDR258C_MM_888 

aaatagcagcaaaggcctcgtcttg 

YDR258C_MM_888T 

aaaTagcagcaaaggccTcgTcTTg 

YDR258C_MM_890 

atagcagcaaigccctcgtcitggc 

YDR258C_MM_890T 

aTagcagcaaTgcccTcgTcTTggc 

YDR258C_MM_896 

gcaat ggcctcgacttggccaacga 

YDR258C_MM_896T 

gcaaTggccTcgacTTggccaacga 

YDR258C_MM_043 

TcTggTag mCamCgtcaagmCtTag 



TC 

Tat 

YDR258C_MM_LNA3_043 TtiGgtAgcAcgTcaAgcTtaGtat 

YDR258C_MM_078 

mCamCt mCtaamCagtatcgmCcgT 



TC 

tTmCTa 

YDR258C_MM_LNA3_078 mCacTctAacAgtAtcGccGtlTcta 

YDR258C_MM_124 

gTigmC mCaTggagat mCaaaaTmC 



TC 

tgTc 

Y DR 25 8C Jrf M^LN A3. 

.124 GttGccAtgGagAtcAaaAicTgtc 

YDR258C_MM_164 

TmCaaTggmCcTTggacmCaTaTa 



TC 

aTTg 

YDR258Q_MM_LNA3_ 

.164 TcaAtgGccTtgGacmCatAtaAttg 

YDR258C_MM_201 

tmCagTtagcmCaaacctTcgmCttm 



TC 

Cat 

YDR258CJUM_LNA3_20 1 TcaGtt AgcmCaaAccTtcGctTcat 

YDR258C_MM_249 

TiTtTmCggmCmCaatcgatmCtTg 



TC 

aaTt 

YDR258C_MM_LNA3_249 Tti*ncGgcmCaaTcgAicTcgAati 

YDR258C_MM_295 

aTTgamCmCTmCaaatcTTtmCTT 



TC 

ggaTa 

YDR258C_MM_LNA3_295 AtlCacmCtcAaaTctTtcTtgGata 

YDR258C_MM_356 

gaTgaamCTmCaggaggaTaggaTm 



TC 

CTt 

YDR258C_MM_LNA3_356 GatGaamCtcAggAggAtaGgaTcti 

YDR258C_MM_424 




TC 

amCcaTcaTcacccaaccTtgTgt mCg 

Y DR258CM MJLNA3_424 Acc Ate Ate AccGaamCttTgtGtcg 

YDR258C_MM_433 

mCfnCmCaacTTtgTgacgTTTaaT 



TC 

aaaa 

YDR258C__MM_LNA3_433 mCccAacTttGtgAcgTUAaiAaaa 

YDR258C_MM_486 

mCaaTgaTmCgTgttamCggaaaTm 



TC 

Caac 

YDR258C_MM_LNA3_486 mCaaTgaTcgTgtAacGgaAatmCaac 

YDR258C_MM_5!5 

TggmCcTagggaaacggTcagmCtTa 



TC 

c 

YDR258C_MM_LNA3_5 1 5 TggmCctAggGaaAcgGtc AgcTtac 

YDR258C_MM_566 

TTggaaa mCat mCgc ggTgmCgctTt 



TC 

Tt 

YDR258C,MM_LNA3_566 TtgGaaAcaTcgmCggTgcGctTtU 
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YDR258C_MM_569 
TC 

YDR258C_MM_604 
TC 

YDR258C_MM_63l 
TC 

YDR258C_MM_686 
TC 

YDR258C_MM_757 
TC 

YDR258C_MM_8I3 
TC 

YDR258C_MM_823 
TC 

YDR258C_MM_870 
TC 

YDR258C_MM_888 
TC 

YDR258C_MM_890 
TC 

YDR258C_MM_896 
TC 


gaaamCaTmCggggag mCgciuTtm 
Caa 

aaaamCgamCagmCaaaaggmCTtT 
mCtTc 

ga mCagmCcTcagtaaaTTggcmCa 
cmCa 

mCmCgaTTaaamCgacagamCagT 
aTgmCi 

aTmCaaaTaggaaat mCagmCTaaag 
mCc 

gamCmCTaagaamCaaaaagmCTg 
gmCaat 

aTaaagmCTggmCattaggTmCTcT 
TTt 

aga mCgTamCagmCaacagaaaTag 
mCag 

aaaTag mCag mCaaagg mCcT mCg 
TctTg 

aTagmCagcaaTgcccTcgt mCtTggc 

gmCaaTggcctmCgactggccaamCg 

a 
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YDR258C_MM_LNA3_569 GaaAcaTcgGggAgcCctTltTcaa 

YDR258C_MM_LNA3_604 AaaAcgAcaGcaAaaGgcTttmCttc 

Y DR258C_M M_LNA3_63 1 Gac AgcmCtc Agt AaaTlgGcc Acca 

YDR258C_MM_LNA3_686 mCcgAttAaamCgamCagAcaGtaTgci 

YDR258Q_MM_LNA3_757 AtcAaaTagGaaAtcAgcTaaAgcc 

YDR258C_MM_LNA3_81 3 GacmCtaAgaAcaAaaAgcTggmCaat 

YDR258C_M M_LNA3_823 Ata AagmCtgGcaTtaGgtmCtcTm 

YDR258C_MM_LNA3_870 AgamCgtAcaGcaAcaGaaAiaGcag 

AaaTag mCagmCaaAggmCct mCgt mCt 
YDR258C_MM_LNA3_888 Ig 

YDR258C_MM_LNA3_890 AtaGcaGcaAtgmCccTcgTctTggc 
YDR258C_MM_LNA3_896 GcaAtgGccTcgActTggmCcaAcga 


Table 14. Duplex melting temperatures (Tm) for the 144 different 25-mer oligonucleotide 
capture probes. The design column denotes the sequence design of the probe. PM = perfectly 
matched probe, MM = single mismatch probe t LNA substitutions are depicted by capital letters, 
5 m C denotes LNA methyl-C 


Probe 
number 

Oligonucleotide target name 

Complementary target sequence 

Design 

Average 
duplex 
melting 
temp. (°C) 

12696 

YDR258Cjrm_predic_043 

utggt agcacglcaagcitagtat 

MM-DNA 

59.2 

12699 

Y DR25 8CJTm_predic_078 

cact c t aacagiat cgccgt t tcta 

MM-DNA 

61.1 

12694 

YDR25 8C_Tm_predic_ 1 24 

gugccatggagaKaaaatctgtc 

MM-DNA 

60 

12700 

YDR258C_Tm_predic_164 

tcaatggccuggaccatataattg 

MM-DNA 

59.5 

12693 

YDR258C_Tm_predic_201 

tcagttagccaaaccttcgcitcat 

MM-DNA 

61.4 

12698 

YDR258CJTm_predic_249 

itutcggccaatcgatcttgaatt 

MM-DNA 

58.8 

12702 

YDR258C_Tm_predic_295 

atigacctcaaatccttcctggata 

MM-DNA 

54.5 

12692 

YDR258CJTm_predic_356 

gatgaactcaggaggataggatctt 

MM-DNA 

58.2 

12680 

YDR258C_Tm_predic_424 

accaicatcaccgaacittgigtcg 

MM-DNA 

63.6 

12703 

YDR258CJTm_predic_433 

cccaactttgtgacgt Itaataaaa 

MM-DNA 

56.2 

12681 

Y DR25 8C_Tm_predic_486 

zaatgaicgtgtaacggaaatcaac 

MM-DNA 

59.3 
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12682 

YDR258C_Tm_predic_5 1 5 

LeecctacEeaaaceetcaecttac 

MM-DNA 

65.2 

12683 

YDR25 8C_Tm_predi c_566 

tieeaaacatcgc eetRcccttuc 

MM-DNA 

61.5 

12684 

YDR258C_Tm_predic_569 

gaaacai c ggggagc gcu tttcaa 

MM-DNA 

63 8 

12701 

YDR25 8C_Tm_p red i c_604 

aaaac gacagcaaaaggctt tcttc 

MM-DNA 

58.8 

12685 

YDR258C__Tm_predic_63 1 

gacagcctcagtaaattggccacc a 

MM-DNA 

64.8 

12686 

YDR258C_Trn_predic_686 

ccgatta a acgacagacagt at get 

MM-DNA 

57.1 

12687 

YDR258C_Tm_predicJ757 

atcaaat ag gaaaicagc taaagee 

MM-DNA 

54.3 

12688 

YDR258C__Tm_predic_8 1 3 

gacctaagaacaaaaagctggcaat 

MM-DNA 

58.2 

12697 

YDR258C_Tm_predic_823 

aiaaagctggcatiaggtctciut 

MM-DNA 

58.4 

12695 

YDR258CJTrn_predic_870 

aeacetacaEcaacaeaaataccac 

MM-DNA 

61.2 

12689 

YDR258C_Tm_predic_888 

aaatagcagc aa aggcctcgtcttg 

MM-DNA 

64 

12690 

YDR258C Tm_predic 890 

a ta pea pcaal pcccIc elcH e ec 

MM-DNA 

62.9 

12691 

Y DR2 58C_Tm_pred ic_896 


MM-DNA 

68.5 

12720 

YDR258C Tm Dredic 043T 

TTTppTapcacetcaaecTTapTat 

MM-T 

68.6 

12723 

YDR258C Tm_predic 078T 

cacTcTaac agtaicgccg I " I " I cTa 

MM-T 

69.7 

12718 

YDR258C Tm predic I24T 

pTTpccaTepaeatcaaaaTcTpTc 

MM-T 

69.2 

12724 

YDR258C Tm Dredic 164T 

Tc aaTcrcccTTeeaccaTaTaaTTe 

MM-T 

69.7 

12717 

YDR258C Tm Dredic 20 IT 

T"p a oTTa Pf~f*aa a w'I'I'p cr rTTV* a t 

a Wag 1 1 agWKMflVtV 1 I Vgk ■ 1 will 

MM-T 

69.4 

12722 

YDR258C Tm Dredic 249T 

Tt ' I w 1 ' I'r p pceaa t roaTcTTp aaTl 
in i j vgjjv»va<ivv.g«j i vi * tS"" ■ * 

MM-T 

65.9 

12726 

YDR258C Tm Dredic 295T 

aTTpaccTcaaatcTTTcTTe oaTa 

MM-T 

65.1 

12716 

YDR258C Tm predic 356T 

paT p a a rTc*a p p a p p aTa p p a Tf* Tt 

MM-T 

64.9 

12704 

YDR258C Tm predic 424T 

accaTcftTcacccaacTTT pTpTcb 

MM-T 

74 

12727 

YDR258C Tm oredic 433T 

ccc aacTTl'eTo aceTTTa aTaaa a 

MM-T 

66.5 

12705 

Y DR258C_Tm_pred ic_486T 

caaTgaTcgTgt aacggaaaTcaac 

MM-T 

65.2 

12706 

YDR258C Tm _predic 515T 

T*p pcpTap p paa ar ocTcac cTTac 

MM-T 

71.6 

12707 

YDR258C_Tm _predic_566T 

TTe eaaacaTceceeTccBcTTTTi 

i MM-T 

68.6 

12708 

YDR258C_Tm_pred ic_569T 

paaacaTcBO fie a ec PcTTTTTcaa 

MM-T 

69.9 

12725 

YDR258C_Tm_4>redic_604T 

aaaacgacagcaaaaggcTTTcTTc 

MM-T 

65.9 

12709 

YDR258C Tm predic 63 IT 

para p rcTca PtaaaTT p pccacca 

MM-T 

68.8 

12710 

YDR258C_Tm_predic_686T 

ccgaTTaaacgacagacagTaTgct 

MM-T 

63.5 

1271 1 

YDR258C_Tm_predic_757T 

aTc a aaTaggaaatc agcTaaagcc 

MM-T 

58.1 

12712 

YDR258C_Tm_predic_8 1 3T 

gaccTaagaacaaaaagcTggcaat 

MM-T 

61.1 

12721 

YDR258C_Tm_predic_823T 

aTaaaficTcficaaaafiBTcTcTTTt 

* Wltl| p* ■ OO OO * ■ ■ 

MM-T 
13+14 

67.2 

12719 

Y DR25 8C_Tm_pred ic_870T 

agacgTacagc aacagaaaTagcag 

MM-T 

65 

12713 

YDR258C_Tm_predic_888T 

aaaTagcagcaaaggccTcgTcTTg 

MM-T 

70.4 

12714 

YDR258C_Tm_predic_890T 

aTagcagcaaTgcccTcgTcTTggc 

MM-T 

71.3 

12715 

YDR258C_Tm_predic_896T 

ecaaTceccTceac T J ccccaacca 

D OO • eft O 

MM-T 

73.7 

12744 

YDR25 8C JTm_predic J343TC 

TtTggTagmCamCgtcaagmClTagTat 

MM-TC 

73.3 

12747 

YDR258C_Tm_predic_078TC 

mCa mCt mCtaamCagtaTcgmCcgTtTmCTa 

MM-TC 

75.2 

12742 

YDR258C_Tm_predic_l 24TC 

gTtgmCmCaTggagatmCaaaaTmCtgTc 

MM-TC 

61.6 

12748 

YDR258C_Tm_predic_)64TC 

TmCaaTgg mCcTTggac mCaTaTaaTTg 

MM-TC 

74.8 

12741 

YDR258CJTm_predic_201TC 

LmCagTtagcmCaaacctTcgmCumCat 

MM-TC 

70.6 

12746 

YDR258C_Tm_predic_249TC 

riTtTmCggmCmCaatcgaimCcTgaaTi 

MM-TC 

71 

12750 

YDR258C_Tm_predic_295TC 

aTTgamC mCTmCaaat cTTt mCTTgg aTa 

MM-TC 

72.2 
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12740 

YDR258C Tm oredic 356TC 

paTeaamPTmCaeenffonTapffaTmCmt 

MM-TC 

70.4 

12728 

YDR258C_Trn_predic_424TC 

a mCcaTcaTcacccaaccTtgTgttnCg 

MM-TC 
13+16 

70.2 

12751 

YDR258C_Tm_prcdic_433TC 

mC mC mCaacTTt gTgac gill aaTaaaa 

MM-TC 

67.6 1 

12730 

YDR258CJTm w predic_5 1 5TC 

Tgg mCcTaggg a aac ggTc ag mClTac 

MM-TC 

75.5 

12731 

YDR258C.Tm_predic_566TC 

TTggaaa mCatmCgcggTgmCgctTtTt 

MM-TC 

72 

12732 

YDR258C_Tnupredic_569TC 

gaaamCaTrrtCggggagmCgctiiTtrnCaa 

MM-TC 

74.8 

12749 

YDR258C_Tm_predic_604TC 

aaa a mCga mCag mCoaaagg mCTiTmCtTc 

MM-TC 

72 

12733 

YDR258C_Tm_predic_63 1 TC 

ga mCag mCcTc agt aa aTTgg c mCac mCa 

MM-TC 

77.4 

12734 

YDR258C_Tm_predtc_686TC 

mC m C gaTTaaa mCgacag a mCagTaTgmCt 

MM-TC 

70.2 

12735 

YDR258C Tm oredic 757TC 

aT mC aaaTa p p aaal mCac mCTaaa e mCc 

MM-TC 

64.6 

12736 

YDR258C_Tm_predic_8 1 3TC 

gamC mOTaagaa mC aaaaagm CTgg mCaat 

MM-TC 

71.4 

12745 

YDR258C Tm oredic 823TC 

aTaaac mfTo p mCat t a ppTmCTf' I " 1 " 1 "t 

MM-TC 

74 

12743 

YDR2S8C Tm nredic 870TC 

a crp mfpTa mraprnfaaraaaaflTaomflflo 

[iKulllWC 1 Ql I IwttftlilWfllHrOiJCilOu 1 CI c, 1 1 IxAlfc 

MM-TC 

73.6 

12737 

YDR258C Tm oredic 888TC 

a a aTa o mPa p mCa a»po mf cTm C pTciTp 

MM-TC 

79.3 

12738 

YDR258C_Tm predic 890TC 

aTflo mPflffr a ATocerTr pi mPi TPp pc 

MM-TC 

71.5 

12739 

YDR258C Tm predic 896TC 

praafppprtmPpamCttPPCcaacpa 

MM-TC 

73.1 

12768 

YDR258C Tm oredic 043 PM 

tttPPiAP^a^PAcaapcttaptat 

PM-DNA 

65.7 

12771 

YDR2S8C Tm oredic 078 PM 

ractctaacaotttcficcetttcta 


66.3 

12766 

YDR2SRC Tm oredic 124 PM 

aft crrr-a toon oft raa aat r f Ptf- 

PM-DNA 

65.8 

12772 

YDR2S8C Tm oredic 164 PM 

1 L/l\fcJQW_ m III l/l WU1L 1 V#*T r It| 

(pga 1 o pf*r"t t o r net*. a t at a a 1 1 p 

PM-DNA 

64 

12765 

YDR258C Tm oredic 201 PM 

teapt taprraa t re 1 1 r pet teat 

PM-DNA 

66.1 

12770 

YDR2^8C Tm oredic 249 PM 

rttrt r"p prr»a a an patct t o aatt 

PM-DNA 

65 

12774 

YDR258C Tm oredic 295 PM 

anparrtf aaaartttrftppata 

PM-DNA 

61.8 

12764 

YDR258C Tm oredic 356 PM 

pat paartr*a ppt ppatap pate t r 

5 *o oo oo oo 

PM-DNA 

64 

12752 

YDR258C_Tm predic_424_PM 

accatcatcacccaac tttgi gi eg 

PM-DNA 

67.5 

12775 

YDR258C_Tm predic_433_PM 

cccaactt t pi etc et ( laaiaaaa 

^VvimVIHCICIvR 1 * ************** 

PM-DNA 

61 

12753 

YDR258C_Tm predic_486_PM 

caatgatcgtgttacggaaatcaac 

PM-DNA 

64.2 

12754 

YDR25 8C_Tm predic_5 1 5_PM 


PM-DNA 

70.1 

12755 

YDR258C_Tm predic_566_PM 

[tccaaacatccfiCElececttttt 

PM-DNA 

70.8 

12756 

YDR258CjTm predic_569_PM 


PM-DNA 

68.7 

12773 

YDR258C_Tm predic_604_PM 

aaaacgacagcataaggcttlci tc 

PM-DNA 

64.5 

12757 

YDR258C_Tm predic_63 1 _PM 

2acaecctca2ttaattESccacca 

□"•^o c bo 

PM-DNA 

70.2 

12758 

YDR258C_Tm predic_686_PM 

cccattaaaceaeaeacaetatect 

PM-DNA 

65.4 

12759 

YDR258C_Tm predic_757_PM 

atcaaataggaatccagctaaagcc 

PM-DNA 

61.5 

12760 

YDR258C_Tm predic_813_PM 

gacct aag aac a t aaagciggcaa t 

PM-DNA 

63.5 

12769 

YDR258C_Tm predic_823_PM 

ataaagctggcaai aggicicutt 

PM-DNA 

63.6 

12767 

YDR258C_Tm predic_870_PM 

agacgtacagcatcagaaatagcag 

PM-DNA 

66.6 

12761 

YDR258C_Tm predic_888_PM 

aaataecaecaatcficctcetctte 

PM-DNA 

69.2 

12762 

YDR258C_Tm predic_890_PM 

atagcagcaatggcctcgicitggc 

PM-DNA 

72.7 

12763 

YDR258CJTm predic_896_PM 

gcaatggccicgtcttggccaacga 

PM-DNA 

72.8 

12792 

YDR258C_Tm predic_043T_PM 

TTTggTagcacgacaagcTTagTat 

PM-T 

74.2 

12795 

YDR258CJTm predic_078T_PM 

cracTcTaacagtTtcgccgTTTcTa 

PM-T 

75.5 

12790 

YDR258C_Tm predic. 1 24TJ>M 

gTTgccaTggagTtcaaaaTcTgTc 

PM-T 

74.7 

12796 

YDR258CJTm predic. I64T.PM 

rcaaTggccTTgcaccaTaTaaTTg 

PM-T 

74 
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12789 

YDR258C_Tm predic_201T_PM 

TcapTTa occaaTrrTTr pcTTrat 

PM-T 

75.7 

12794 

YDR258C Tm nredic 249T PM 

■ L/l\fcJUV_ 1 III pi S.l*I*»_*i" , T^ 1 1 ITI 

TlTTTc o o rroaar o a Tr TTpflflTt 

■ till Jj^i if 2 *" 1 tj 1L 1 I gua ■ I 

PM-T 

71 

12798 

YDR258C_Tm prcdic_295T_PM 

aTTp aecTraaa acTTTrTTp paTa 

PM-T 

70.2 

12788 

YDR258C_Tm predic_356T_PM 

eaTo aacTca ppTpo aTa p paTrTfc 

PM-T 

71 

12776 

YDR258C_Tm predic_424T_PM 

accaTcaTc acccaaeTTT pTpTc e 

PM-T 

79.2 

12799 

YDR258C Tm Dredie 433T PM 

cccaacTTTpToTc pTTTaaTaaaa 

PM-T 

75.1 

12777 

YDR258C Tm oredtc 486T PM 

wwToflTfoTolTflpp oaaaTcAAC 

uoa ■ Ka ■ VK * K» ■ uwKyklfl I Wonw 

PM-T 

72.2 

12778 

YDR258C_Tm predicts 15T PM 

TppccTflPPPflaTcppTrra PcTTac 

PM-T 

76.9 

12779 

YDR258C_Tm predic_566T_PM 

1*1 pan aacaTcp pppTpcpc I " 1 " I " 1 f 

PM-T 

76.7 

12780 

YDR258C Tm nredic 569T PM 

paafl caTc ppppT p c pcTTTTTc aa 

gdaaLa i Lgggg 1 5^5^ ' III ii»oo 

PM-T 

78 

12797 

YDR258C Tm nredic 604T PM 

anjuw on r a Di*aTaaoorTTTrTTf 
artudl. ga.L.a*giwo ■ dagjgc I i 1 1> I 1 1» 

PM-T 

717 

12781 

YDRTSRP Tm nredic fillT PM 
■ i/K*joy i fit picLiic__u.> i i i ivi 

nnra trr"/^T/»!a otTa qTTo nrr a r*r*o 

PM-T 

75.4 

12782 

YDR2SRP Tm nredic fiftfVT PM 

I UI\IJOv_ 1 111 |JI VM UOU 1 r JVI 

*ya 11 oQO^AonaAAMATaTft^l 
LCga 1 1 <Ui<n.gdgdg<u;<Jg i a i gt-i 

PM-T 

72.2 


YDR2SRP Tm nredic 7S7T PM 

«T<»o O «To Aftaa 1 ~ I >« i»/«To « 4 o/» /• 

a 1 Ldild l ajg'gaa 1 1 1 daalJCb 

PM-T 

Ou.J 

12784 

YDR2SRP Tm nredic RHT PM 

OQC^Ta QftaaMTooaarToorail 
gaLL l ddgauLd I adUgl. 1 JggCdat 

PM-T 

D/.J 

12793 

YnR9SftP Tm nrpdir R9TT PM 

r,Tn a n a cToor-ci aTa o oTrTr I * 1 ' 1 1 

PM-T 

74. 1 


YDR2SRP Tm nredic R7lYT PM 

I UI\iJOVrf_ 1 III pi CUIL__0 / \J 1 rxVI 

a crap nTft/*Q<t^ qTVq mi n oTqo/* a n 

PM T 


1 9"7RS 

Yr\P9SflP Tm rtrwlir fififlT PM 

aa aTagca gcaaTggccTcgTcTTg 

PM T 
rfVl- 1 

77 9 

197Rfi 

1 £. i OU 

YDR2^RP Tm nredic RQOT PM 

I UI\*.JOv_ 1 III piCUIL._07\S 1 r JVI 

aTa or 1 a ormTo cTr*r*Tr* aVn' Wo on 

a i dgcdgcdd i ggcx i eg i c i i ggc 

PM-T 

1 IVI - 1 

sn i 

Ow. 1 

12787 

YHR2SRP Tm nredic RQrVT PM 
I LyI\i.JOl— 1 III piCUIL. 0 j»v> 1 rlVI 

r»i~o'»Trro/»r»T/»oTr»'Tr*Tc*or»r»o«ir»c»«a 

gcaa i ggee i eg i c i i ggecaaega 

PM T 

r JVI- 1 

SO R 

1 £D 1 O 

Yr>R9SflP Tm nrerfir fldlTf"' PM 

ill gg i agni^ami^gani^aagmv-.i i ag ■ ai 

PM TC 


12819 

YDR9SRP Tm nredic 07RTP PM 

mPamPl , f*iPloiifriPan , tT»/-f»mPr«cTTlTrri f 'l a 

PM-TC 

81 9 

1 9R1 d 

YHB9SRP Tm nnv1i/< \0ATC PM 

g 1 igmL.m^a 1 ggag 1 irnv.aaaa 1 mi^ig 1 c 

PM TC 

7R 1 

12820 

YDR95RP Tm nredir IrvdTP PM 
■ L/i\iJo^_ i m prt\iii»_ i oh i v- i jvi 

1 lll\_<xd 1 ggmiwC 1 1 gLIl*wOt.Jll*wa I a 1 aa 1 1 g 

PM.TP 

R^ ^ 

12813 

YDR9SRP Tm nredic 901TP PM 

I L/IMJOC 1 III p 1 CU 1 L £\i 1 1 v»_ri»i 

TmpQoT>aor , mPimTrTnPtTmPomPrr mPal 
1 lllv^ag i IdgLulV-da 1 L- 1 1 TV. 1 1 II IVg 1 1 IV^H 1 HV^dl 

PM-TP 

Rl R 

12818 

YDR2S8C Tm nredic 249TP PM 

TfTtTmPaomPmPaaamPoaf mP»Tonn*Tt 
ILlll Illv^g^iIIV-.III'w.aailLI IV^gdl IIIV^l 1 £ilu ■ I 

PM-TC 

77.8 

12822 

YDR2SRP Tm nredic 9QSTP PM 

I L/I\4JOV* 1111 piCVJIL. 47J 1 V-^ r 1V| 

a" 1 "1 "on mPmPTmPnaan m( ' I I t mTTTiToaTa 
all gaIllv^IIIV^> 1 Illv^aaadll 1 1 III IV — 1 1 ggd * a 

PM-TC 

75.6 

12812 

YDR2S8C Tm nredic T^cVTP PM 

I l/l\AJOV . 1 III pi Will J-JvE 1 I IVI 

crnTcrnnnnf * 1 mPacroTotraTa rr onTmPTf 
gd 1 gddlll\_> 1 IIIV_dgg 1 ggu 1 aggd 1 Ill\_ I I 

PM-TC 

77.3 

12800 

YDR258C Tm predic 424TC PM 

a mCcaTcaTcac mC mCaaccTtgTgt mCg 

PM-TC 

74.2 

12823 

YDR258C_Tm predic_433TC_PM 

mCmP mPaacTTt pTpTc pTTTaaTa a aa 

PM-TC 

74.8 

12729 

Y DR258C_Tm_predic_486TC_PM 

mCaaTgaTmCgTgtiamCggaaaTmCaac 

PM-TC 

75.5 

12802 

YDR258C_Tmpredic 515TC PM 

Tp o mCcTao apaaTmP ppTca p mPiTac 

PM-TC 

83.6 

12803 

YDR258C Tm nredic 566TC PM 

TTeoaaamPai mPppppTpmPpciTiTi 

PM-TC 

80.8 

12804 

YDR258C Tm nredic 569TC PM 

paaamPaTmPpp PpTpmP PctltTt mPaa 

giuUllllWil A IIIWLCLU IKII l^f CWll lit IIIVuu 

PM-TC 

83.2 

12821 

YDR258C_Tm predic_604TC_PM 

aaaa mCg a mCag mCaTaaggmCTlTmCl Tc 

PM-TC 

79 

12805 

YDR258C_Tm predic_63 ITQ_PM 

garnCagmCcTcagtTaaTTggcmCacmCa 

PM-TC 

82.5 

12806 

YDR258C_Tm predic_686TC_PM 

mCmC gaTTa aa mC gagagamCagTaTgmCt 

PM-TC 

79.4 

12807 

YDR258C Tm predic_757TC_PM 

aTmPaaaTappaaTt mpapmPTaaapmPc 

PM-TC 

71.4 

12808 

YDR258C_Tm predict 1 3TC_PM 

ga mC mCTaagaamC aTaaagmCTgg mCaal 

PM-TC 

78.9 

12817 

YDR258C_Tm predic_823TC_PM 

aTaaagmCTggmCaaTaggTmCTcTTTi 

PM-TC 

81.2 

12815 

YDR258C_Tm predic_870TC_PM 

agamCgTamCag mCaTmCagaaaTagmCag 

PM-TC 

81.9 

12809 

YDR258C_Tm predic_888TC_PM 

aaaTagmCagmCaaTggmCcTmCgTctTg 

PM-TC 

86.8 

12810 

YDR258C_Tm predic_890TC_PM 

aTagmCagcaaTggmCcTcgtmOTggc 

PM-TC 

83 

12811 

YDR258CjTm predic_896TC_PM 

geaatggee t mCgTmQ i ggecaaega 

PM-TC 

79.3 
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Example 52. Performance analysis of LNA substituted oligonucleotide capture probes designed 
to detect splice variants in complex RNA pools. 

A. Oligonucleotide design for rnicroarrays. The methods for designing exon-specific internal 
5 oligonucleotide capture probes has been described in example 2. 

Al. Design of the LNA-modified capture probes 

For the internal LNA-modified oligonucleotide capture probes, every third DNA 

nucleotide was substituted with an LNA nucleotide. The probes designed to capture the splice 

junction of the recombinant splice variants were designed with LNA substitutions at every third 
10 nucleotide position. All capture probes are shown in Table 15. 

Table 15. Internal, exon-specific and merged, exon-exon splice junction specific 
oligonucleotide capture probes used in the example 51. Capital letters denote LNA nucleotides 
and m C LNA methyl-cytosine 


15 


EQ No 

Oligo Name 

Sequence 

10716 

>gene78.01 a 

cctgaaagta g atttgttatttccgaaacgccttctcc cgttcttaagtc 

10717 

>gene78.01b 

catataccacaaataqtcoctcaaaaatcacaagaaaactcacaacactg 

10718 

>gene78.03a 

gatttgcagcggtqgtaaaaagtatgaaaacgtggtaattaaaaggtctc 

10719 

>gene78.03b 

ccaatgaaaactaatcaaaggtaaacgtggatcccatggcaattcccggg 

10720 

>gene78.m0103 

cacaacactgcccagaggttcaatcgataaatatgtgaaggaaatgcctg 

10721 

>gene78.m01 INS3 

caacactgcccagaggttcaatcgatccgatqatcctaatqaaqqcqccc 

10722 

>gene78.mlNS303 

gtccagtatcqtccatcataqtatcqataaatatqtqaaqqaaatgcctg 

10723 

>gene78.INS3 

ctccttcttgcattcttcaacttccttcaacacttqagcgqagtcggtgc 

10724 

>gene78.m01INS4 

caacactgcccagaggttcaatcgatgtgtgataggatcagtgttcaggg 

10725 

>gene78.mlNS403 

gaaggcgaaggagactgctaatatcgataaatatgtgaaggaaatgectg 

10726 

>gene78.INS4a 

qaacqtatqagcatgcgagagacgctgtagttggaaaaacccacgaagcg 

10727 

>gene78.INS4b 

gaaaccgclgattatactgcggagaaggtggqtgagtataaagactatac 

11345 

>gene78.01 a_40 

aaqtaqatttgttatttccgaaacgccttctcccgttctt 

11346 

>gene78.01b_40 

accacaaatagtccctcaaaaatcacaagaaaactcacaa 

11347 

>gene78.03a_40 

gcagcggtggtaaaaagtatgaaaacgtggtaattaaaag 

11348 

>gene78.03b_40 

gaaaactaatcaaaggtaaacgtggatcccatggcaattc 

11349 

>gene78.m01 03_40 

cactgcccagaggttcaatcgataaatatgtgaaggaaat 

11350 

>gene78.m01 INS3_40 

ctgcccagaggttcaatcgatccgatgatcctaatgaagg 

11351 

>gene78.m!NS303_40 

gtatcgtccatcatagtatcgataaatatgtgaaggaaat 

11352 

>gene78.INS3_40 

tcttgcattcttcaacttccttcaacacttgagcggagtc 

11353 

>gene78.m01 INS4_40 

ctgcccagaggttcaatcgatgtgtgataggatcagtgtt 

11354 

>ge ne78.m INS403__40 

cgaaggagactgctaatatcgataaatatgtgaaggaaat 

11355 

>gene78.INS4a_40 

tatgagcatgcgagagacgctgtagttggaaaaacccacg 

11356 

>gene78.INS4b_40 

cgctgattatactgcggagaagglggglgagtataaagac 

11357 

>gene78.01 a_50_LNA3 

mCctGaaAgtAgaTttGttAttTccGaaAcqmCctTctrnCccGttmCttAaqTc 

11358 

>gene78.01 b_50_LNA3 

mCatAtamCk:arnCaaAtaGtcmCctmCaaAaaTcamCaaGaaAacTcamCaarnCacTg 

11359 

>gene78.03a_50_LNA3 

GatTtgmCagmCggTggTaaAaaGtaTgaAaamCgtGgtAatTaaAagGtcTc 

11360 

>gene78.03b_50_LNA3 

mCcaAtgAaaActAatmCaaAggTaaAcgTggAtcmCcaTggmCaaTtcmCcgGg 

11361 

>gene78.m01 03_50_LN A3 

mCacAacActGccmCagAqqTtcAatmCgaTaaAtaTgtGaaGgaAatGccTg 
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11362 

>gene78.m01 INS3_50_LN A3 

mCaamCacTqcmCcaGagGttmCaaTcgAtcmCgaTgaTccTaaTgaAggmCgcmCc 

11363 

>aene78.mlNS303_50J_NA3 

GtcmCaqTatmCgtmCcaTcaTaqTatmCqaTaaAtaTgtGaaGgaAatGccTg 

11364 

>gene78.INS3_50_LNA3 

mC^cmCttmCttGcaTtcTtcAacTtcmCttmCaamCacTtgAgcGgaGtcGgtGc 

11365 

>gene78.m01 INS4_50_LN A3 

mCaamCacTgcmCcaGagGttmCaaTcgAtgTgtGatAggAtcAgtGttmCagGg 

11366 

>gene78.mlNS403_50_LNA3 

GaaGgcGaaGgaGacTgcTaaTatmCgaTaaAtaTgtGaaGgaAatGccTg 

11367 

>aene78.INS4a_50_LNA3 

GaamCglAtgAgcAtgmCgaGagAcgmCtgTagTtgGaaAaamCccAcgAagmCg 

11368 

>qene78.INS4b_50_LNA3 

GaaAccGctGatTatActGcgGagAagGtgGgtGagTatAaaGacTatAc 

11369 

>gene78.01 a_40_LNA3 

aAglAgaTttGttAttTccGaaAcgmCctTctmCccGttmCtl 

11370 

>gene78.01 b_40_LNA3 

a mCcamC aaAta G tc mCctrnCaa AaaTcam C aaG a aAacTc a m Caa 

11371 

>gene78.03a_40_LNA3 

qmCagmCqqTqqTaaAaaGtaTqaAaamCgtGqtAatTaaAag 

11372 

>gene78.03b_40_LNA3 

gAaaActAatmCaaAggTaaAcgTggAtcmCcaTggmCaaTtc 

11373 

>gene78.m0 1 03_40_LNA3 

cActGccmCagAggTtcAatmCgaTaaAtaTgtGaaGgaAat 

1 1374 

>gene78.m01 INS3_40_LNA3 

cTqcmCcaGaqGttmCaaTcgAtcmCgaTgaTccTaaTgaAgg 

11375 

>gene78.mlNS303_40_LNA3 

qTatmCqtmCcaTcaTaqTatmCqaTaaAtaTqIGaaGqaAat 

11376 

>gene78.INS3_40_LNA3 

tmCtlGcaTtcTtcAacTtcmCttmCaamCacTtgAgcGgaGtc 

11377 

>gene78.m01 INS4_40_LNA3 

cTgcmCcaGagGttmCaaTcgAtgTgtGatAggAtcAgtGtl 

11378 

>gene78.mlNS403_40_LNA3 

cGaaGgaGacTgcTaaTatmCgaTaaAtaTglGaaGgaAat 

11379 

>gene78.INS4a_40_LNA3 

tAtgAgcAtgmCgaGaqAcgmCtgTagTtqGaaAaamCccAcg 

11380 

>gene78.INS4b„40_LNA3 

cGclGatTatActGcgGagAagGtgGgtGagTatAaaGac 


B. Printing and coupling of the splice isoform-specific microarrays 

The splice variant capture probes were synthesized with a 5' anthraquinone (AQ)-modification, 
followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes were first diluted to a 
5 20 l*M final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on the 
Immobilizer polymer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One 
(Packard Biochip Technologies, USA) with a spot volume of 2x 300 pi and 300 \xm between 
the spots. The capture probes were immobilized onto the microarray slide by UV irradiation in 
a Stratalinker with 2300 pjoules (Stratagene, USA), Non-immobilized capture probe 
10 oligonucleotides were removed from the slides by washing the slides two times 15 min. in 
IxSSC. After washing, the slides were dried by centrifugation at lOOOx g for 2 min., and stored 
in a slide box until microarray hybridization. 

C. Construction of the Splice Variant Clones 

The recombinant splice variant constructs were cloned into the Triampl8 vector (Ambion, 
15 USA). The constructs were sequenced to confirm their construction. The plasmid clones were 
transformed into E. coli XLIO-Gold (Stratagene, USA). 

Genomic DNA was prepared from a wild type standard laboratory strain of Saccharomyces 
cerevisiae using the Nucleon Mi Y DNA extraction kit (Amersham Biosciences, USA) 
according to the supplier's instructions. Amplification of the partial yeast gene was done by 
20 standard PCR using yeast genomic DNA as template. In the first step of amplification, a 

forward primer containing a restriction enzyme site and a reverse primer containing a universal 
linker sequence were used. In this step 20 bp was added to the 3'-end of the amplicon, next to 
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the stop codon. In the second step of amplification, the reverse primer was exchanged with a 
nested primer containing a poly-T2o tail and a restriction enzyme site. The SWI5 amplicon 
contains 730 bp of the SWI5 ORF plus 20 bp universal linker sequence and a poly~A2o tail. 
The PCR primers used were; 
5 YDR146C-For-EcoRI: acgtgaattcaaatacagacaatgaaggagatga 
YDR146C-Rev-Uni: gatccccgggaattgccatgttacctttgattagtmcattggc 
Uni-polyT-BamHI: acgtggatccttttttttttttttttttttgatccccgggaattgccatg, 

The PCR amplicon was cut with the restriction enzymes, £coRI + BamHl. The DNA 
fragment was ligated into the pTRIampl8 vector (Ambion, USA) using the Quick Ligation Kit 
10 (New England Biolabs, USA) according to the supplier's instructions and transformed into E. 
coli DH-5ot by standard methods. 

CI. Construction of the recombinant splice variant #1 (Triampl8/swi5-rubisco) 

The Arabidopsis thaliana Rubisco small subunit ssu2b gene fragment (gi 17064721) was 

amplified from genomic DNA by primers named DJ305 5*- 

15 ACT ATG ATGG ACG AT ACTGG AC-3 ' and DJ306 5'- 

ATTGGATCGATCCGATGATCCTAATGAAGGC-3', containing Clal restriction site linkers. 
The purified PCR fragment was digested with Clal and then cloned into the swi5 (gl:7839148) 
vector at the unique Clal site (atcgat) giving each insert a flanking sequence from the original 
yeast SWI5 insert (named exonOl and exon 03, see Figure 48). The product was inserted in the 

20 reverse orientation, so that the insert sequence is: 

atcgatCCGATGATCCTAATGAAGGCGCCCGGGTACTCCTTCTTGCATTCTTCAACTTC 
CTTCAACACTTGAGCGGAGTCGGTGCATCCGAACAATGGAAGCTTCCACATTGTCC 
AGTATCGTCCATCATAGTatcgat 

25 

Nucleotide sequence analysis revealed a difference between the sequence of A. thaliana 
Rubisco expected from the GenBank database and that obtained from all sequenced constructs 
and PCR products. Position 30 in the Rubisco insert is C rather than the expected A. This SNP 
was probably created by PCR. None of the oligonucleotide capture probes used in the example 
30 cover this region. 

Rubisco seq. in genbank TCCTAATGAAGGCGCCA 
The sequence obtained 

from the plasmid contract TCCTAATGAAGGCGCCC 

C2. Construction of the recombinant splice variant # 2 (Triampl8/swi5-lea) 


LNA2I discriminating probes SKA/MSL 


5/I4V2003 


202 

The Arabidopsis thaliana Lea gene (gil526423) was amplified from genomic DNA with 
primers named DJ307 5*-GGAATTATCGATGTGTGATAGGATCAGTGTTCAG-3\ and 
DJ308 5 '-A ATTGG ATCG AT ATTAGC AGTCTCCTTCGCC-3 f , including the Clal linker sites 
as above. The PCR fragment was digested with Clal cloned into the yeast SWI5 IVT construct 
S as above at the unique Clal site. 
The fragment was inserted in the forward orientation, resulting in the following insert sequence: 
atcgatGTGTGATAGGTTCAGTGTTCAGGGCTGTCCAAGGAACGTATGAGCATGCGAG 
AGACGCTGTAGTTGGAAAAACCCACGAAGCGGCTGAGTCTACCAAAGAAGGAGCT 
CAGATAGCTTCAGAGAAAGCGGTTGGAGCAAAGGACGCAACCGTCGAGAAAGCTA 
1 0 AGG A A ACCGCTG ATTATACTGCGG AG A AGGTGGGTG AGT AT A AAG ACT ATACGGT 
TGATAAAGCTAAAGAGGCTAAGGACACAACTGCAGAGAAGGCGAAGGAGACTGCT 
AATatcgat. 

Figure 48 shows the construction of the recombinant splice variants in the in vitro transcription 
15 vector. The small bars show the location of the oligonucleotide capture probes used in this 
example. The sequences of the capture probes are shown in Table 15. 
D. Preparation of target 

Dl. In vitro RNA preparation from splice variant vectors 

In vitro RNA from the splice variants were made using the MEGAscript™ high yield 
20 transcription kit according to the manufacturer's instructions (Ambion, USA). The yield of IVT 
RNA was quantified at a Nanodrop spectrophotometer (Nanodrop Technologies, USA) 
D2. Isolation of total RNA from C. elegans 

Strains and growth conditions: C elegans wild-type strain (Bristol-N2) was maintained on 
nematode growth medium (NG) plates seeded with Escherichia coli strain OP50 at 20 °C, and 

25 the mixed stages of the nematode were prepared as described in Hope, I. A. (ed.) " C. elegans - 
A Practical Approach *\ Oxford University Press 1999. The samples were immediately flash 
frozen in liquid N 2 and stored at - 80 °C until RNA isolation. 

A 100 jil aliquot of packed C elegans worms from a mixed stage population was 
homogenized using the FastPrep BiolOl from Kem-En-Tec for 1 min, speed 6 followed by 

30 isolation of total RNA from the extracts using the FastPrep BiolOl kit (Kem-En-Tec) according 
to the manufacturer's instructions. The eluted total RNA was ethanol precipitated for 24 hours 
at - 20°C by addition of 2.5 volumes of 96% EtOH and 0. 1 volume of 3M Na-acetate, pH 5.2 
(Ambion, USA), followed by centrifugation of the total RNA sample for 30 min at 13200 rpm. 
The total RNA pellet was air-dried and redissolved in 10 \i\ of diethylpyrocarbonate (DEPC)- 

35 treated water (Ambion, USA) and stored at - 80°C. 
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E. Fluorochrome-labelling of the target 

The following fluorochrome-labelled cDNA targets were synthesized to test the 
performance of * merged' splice junction probes that encompass exon borders. Synthetic RNAs 
corresponding to three artificial splice variants; #1 (exon01-INS3-exon03 (1-INS3-3), #2 
5 (exon01-INS4-exon03) (01-INS3-3) and #3 without the middle exon (01-03) were spiked into 
10|ig of C. elegans reference total RNA samples in various combinations and concentrations 
prior to fluorochrome-labelling with either Cy3 or Cy5 as indicated in Table 16. At the same 
time lOug of C. elegans reference total RNA was labeled with Cy3 for control experiments. 
Hybridizations were performed with Cy3- and Cy5 labeled C. elegans RNA + spike RNA mix . 

10 The details of RNA samples and synthetic RNA spikes are shown in Table 16. The RNA 
samples were combined in individual labeling reactions with 5 ug anchored oligo(dT2o) primer 
and DEPC-treated water to a final volume of 8 jil. The mixture was heated at 70°C for 10 min, 
quenched on ice for 5 min, followed by addition of 20 units of Superasin RNase inhibitor 
(Ambion, USA), 1 \xl dNTP solution (lOmM each dATP, dGTP, dTTP and 0.4 mM dCTP, and 

15 3 (xl of Cy3-dCTP or CyS-dCTP, Amersham Biosciences, USA), 4 jil 5 x RTase buffer 
(Invitrogen), 2jil 0. 1 mM DTT (Invitrogen), 400 units of Superscript II reverse transcriptase 
(Invitrogen, USA) and DEPC-treated water to 20 u.1 final volume. Background hybridization to 
merged capture probes was monitored with 10u.g of C. elegans reference RNA alone labeled 
with Cy3-dCTP; according to the labeling method described above for the splice variant spikes. 

20 All cDNA syntheses were carried out at 42°C for 2 hours, and the reaction was stopped by 
incubation at 70°C for 5 min., followed by incubation on ice for 5 min. 

Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR 
columns as described in the following: Pre-spin the column 1 min at 1500 x g in a 1.5 ml tube 
and place the column in a new 1.5 ml tube. Slowly apply the cDNA sample to the top centre of 

25 the resin, spin 1500-x g for 2 min and collect the eluate. The RNA was hydroiyzed by adding 3 
\il of 0.5 M NaOH, mixing and incubating at 70 °C for 15 min. The samples were neutralized 
by adding 3 ui of 0.5 M HC1 and mixing, followed by addition of 450 \i\ lxTE, pH 7.5 to the 
neutralized sample and transfer onto a Microcon-30 concentrator (prior to use, spin 500 ul 
lxTE through the column to remove residual glycerol). The samples were centrifuged at 14000- 

30 x g in a microcentrifuge for 12 min. Spinning was continued until volume was reduced to 5 fil. 
The labelled cDNA probes were eluted by inverting the Microcon-30 tube and spinning at 
1000-x g for 3 min. 

Table 16. Synthetic splice variant RNAs spiked into C. elegans samples*. 
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Spike RNA 

Ratio 

Splice variant RNAs 

Observed ratio 

STDEV 

Observed ratio 

STDEV 

Expected ratio 

(Yinrpfi f »*ft ti o 11 



LNA 50mer 

LNA SOmer 

LNA 40raer 

LNA 40mer 


1000 mm 

5 

Cv3* soike 0 1 -INS V01 

0.76 

0.05 

0.61 

0.19 

0.83 


1 

Cv3* soike 01-03 

0.24 

0.05 

0.39 

0.19 

0.17 


1 

Cy5: spike 01 - INS3-03 

0.16 

0.04 

0.06 

0.03 

0.17 


5 

Cv5< soike 01-03 

0.84 

0.04 

0.94 

0.03 

0.83 










5 

CvV cnilM»0l-ll\KVO3 

0.77 

0.1 1 

0.68 

0.22 

0.83 


1 

Cv3* mi kc 0 1 -1 N "v4-03 

0.23 

0.1 1 

0.32 

0.22 

0.17 


i 
i 

CW cniVf* fll .IN^VOft 

0.12 

0.04 

o.u 1 

0.15 

0.17 


5 

CS/V «niW>Ol-IN^4-03 

0.88 

0.04 

0.89 

0.15 

0.83 









i vaaj ppm 

c 

J 

V— yj. SpiKc Ul -irN jj-LFJ 

0 88 

n rut 

ft R7 

0.10 

VI. O J 


1 
1 

\*yj. SpiKC Ul ~lf\ 

n 1 7 

U. 1 £. 

0.08 


0.10 

O 17 


1 

Cy5: spikcOI-INS3-03 

0.22 

0.12 

0.15 

0.t2 

0.17 


5 

Cy5: spike 01 -INS4 03 

0.78 

0.12 

0.85 

0.12 

0.83 









100 ppm 

5 

Cy3: spike 01 -INS3-03 

0.89 

0.15 

0.11 

0.08 

0.83 


1 

Cy3:spike0l-INS4-03 

0.11 

0.15 

0.89 

0.08 

0.17 


I 

Cy5: spike 01 -INS3-03 

0.48 

0.2 

037 

0.31 

0.17 


5 

Cy5: spike 01 -1NS4-03 

032 

0.2 

0.43 

0.31 

0.83 









10 ppm 

5 

Cy3: spike 01 -INS3-03 

0.61 

0.2 

0.09 

0.14 

0.83 



Cy3: spike 01 -INS4-03 

0.39 

0.2 

0.91 

0.14 

0.17 


1 

Cy5: spike 01 -1NS3-03 

0.34 

0.18 

0.19 

0.22 

0.17 


5 

Cy5: spike 01 -INS4-03 

066 

0.18 

0.81 

0.22 

0.83 


* Parts per million (ppm) calculations indicate spike transcripts per total transcripts in the 
hybridisation mix. Calculations are based on an average C elegans RNA being 1000 
5 nucleotides as in Hill et at (2000). Science 290:809-812. 
F. Microarray hybridization 

The fluorochrome-labelled cDNA samples, respectively, were combined. The following was 
added: 3.75 jil 20x SSC (3x SSC final, pass through 0.22 y. filter prior to use to remove 
particulates) yeast tRNA (1 ng/fxl final) 0.625 |xl 1 M HEPES, pH 7.0 (25 mM fina], pass 

10 through 0.22 fcifilter prior to use to remove particulates) 0.75 |xl 10 % SDS (0.3 % final) and 
DEPC-water to 25 \il final volume. The labelled cDNA target samples were filtered in 
Millipore 0.22 \i filter spin column (Ultrafree-MC, Millipore, USA) according to the 
manufacturer's instructions, followed by incubation of the reaction mixture at 100 °C for 2-5 
min. The cDNA probes were cooled at room temp for 2-5 min by spinning at max speed in a 

15 microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of 
the microarray spotted on Immobilizer™ MicroArray Slide and the hybridization mixture was 
applied to the array from the side. An aliquot of 30 \LL of 3xSSC was added to both ends of the 
hybridization chamber, and the Immobilizer™ MicroArray Slide was placed in the 
hybridization chamber (DieTech, USA). The chamber was sealed watertight and incubated at 
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65°C for 16-18 hours submerged in a water bath. After hybridisation, the slide was removed 
carefully from the hybridization chamber and washed using the following protocol. The slides 
were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room temperature until 
the cover slip falls of into the washing solution, then in lx SSC pH 7.0 (150 mM NaCl, 15 mM 
5 Sodium Citrate) at room temperature for 1 min, then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM 
Sodium Citrate) at room temperature for 1 min, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 
mM Sodium Citrate) for 5 sec, followed by drying of the slides by spinning at 1000 x g for 2 
min. The slides were stored in a slide box in the dark until scanning. 
G. Microarray data analysis. 
10 The splice variant microarray was scanned in a ScanArray 4000XL confocal laser scanner 
(Packard Instruments, USA). The hybridisation data were analysed using the GenePix Pro 4.01 
microarray analysis software (Axon, USA). The C. elegans reference RNA alone converted to 
first strand cDNA and labelled with Cy3-dCTP did not produce significant fluorescence 
intensity signals from the LNA substituted spike RNA specific capture probes. 

15 GL A mathematical formula for analysis of the microarray data for alternative splicing 

One of the major limitations to comparative microarray hybridisation assays is that only 
identical probes can be compared between samples. Different alternative splice forms are 
detected using different probes, and this will tell directly if one splice form is more abundant in 
a given tissue compared to another. However, the estimation of the ratios between splice forms 

20 in a single tissue is not directly accessible. Given an example similar to that described below we 
employ the following calculations to calculate quantities of splice variants from array data. The 
theoretical justification is shown. To our knowledge this justification has not been used by any 
previous analysis. 


25 


splice form A 


exonl 


exon3 


probe2 


probel 


30 


splice form B 


exonl 


exon2 


exon3 


probe3 


probel 


The above scenario is tested in a comparative hybridisation, with two channels: I & II (signal 
35 from probe2 in channel I is called probe2(I), and so forth). Probel hybridises to both splice 
forms, Probe2 hybridises to A only, Probe3 hybridises to B only. 
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Since every transcript will hybridise to probel, and every transcript will hybridise to either 
probe2 or probe3, there exists some relationship between the following: 
probe 1(1) ~ (probe2(I) and probe3(I)}. 
probe 1(D) ~ {probe2(Tl) and probe3(II)}. 
5 For simplicity we assume that systematic differences between channels have already been 
eliminated through normalisation, although this is not essential. 

We now imagine a factor (x) that will transform the signal of probe2 into a value directly 
comparable to probel. Likewise we imagine factor (y) for probe3. As long as we are not facing 
saturation in the hybridisations, the assumption of a linear relationship between absolute probe 
10 signals is reasonable. 

The introduction of variables x & y will give the following equations: 
probel(l) = (x)probe2(I) + (y)probe3(I). 
probel(fl) = (x)probe2(II) + (y)probe3(II). 

Since all signals are measurable, the above is two linear equations with two unknown variables, 
15 that can easily be solved. Further the ratio between (x)probe2(I) & (y)probe3(I) will provide the 

ratio between splice form A and B in channel L Similarly, the ratio of (x)probe2(II) to 

(y)probe3(lI) is used for channel 11. 

Data normalization is not required for this method. 

In the above equations, probel denotes all probes that will hybridize to both 
20 spliceforms, probe2 denotes probes that specifically will hybridize to spliceform A but not B, 

and probe3 denotes probes that will specifically hybridize to spliceform B but not A. 

In the case where two spliceforms consist of gene78 with two different insertsmiddle exons 

(ENS3& INS4), probes can be grouped as follows (only LNA 40mers are considered here): 


Probes that will hybridize to 
both constructs 

Probes that will hybridize to 
INS3 constructs only 

Probes that will hybridize to 
INS4 constructs only 

Gene78.01 a_40_LNA3 

Gene78.INS3_40_LNA3 

Gene78.INS4_40_LNA3 

Gene78.01b_40_LNA3 

Gene78.m01 INS3_40J_NA3 

Gene78.m01 INS4_40_LNA3 

Gene78.03a__40_LNA3 

Gene78.mlNS303_40_LNA3 

Gene78.mlNS403_40J-NA3 

Gene78.03b_40_LNA3 




25 


The equations can be solved with any combinations of one representative from each probe 
group. This gives a total of 48 (4x3x3) possible ways of solving the equations. The estimated 
quantities of the constructs are given as the average of all possible solutions (equations yielding 
30 non-positive solutions are ignored). This was done for all comparative hybridizations. Note that 
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when comparing with gene78 with no insert, only 12 equations are possible (The, since the 
artificial splice variant construct with no insert has only one specific probe). The results from 
analysis of the microarray hybridization data from the RNA pools spiked with different splice 
isoforms at different ratios and concentrations are shown in Figures 49 and 50. 
5 RESULTS 

Figure 49 shows detection of alternatively spliced mRNAs using LN A-substituted 50-mer 
oligonucleotide capture probes in a bar diagram format. Figure 50 shows detection of 
alternatively spliced mRNAs using LN A-substituted 40-mer oligonucleotide capture probes. 
Both 50-mer and 40-mer LNA-DNA mixmer substituted oligonucleotide capture probes, 

10 substituted with an LNA nucleotide at every third nucleotide position, were able to provide 
highly accurate measurements for fold-changes in the expression of three homologous, 
alternatively spliced mRNA variants in the concentration range of 1000 ppm to 10 ppm. The 
quantification of the splice isoforms was carried out using a set of both internal, exon-specific 
probes and merged, splice junction specific probes, printed onto microarrays and hybridized 

15 with complex cDNA target pools spiked with different cloned artificial splice isoforms in which 
the middle exon was either alternatively skipped or excluded completely resulting in the three 
different splice isoforms; 01-INS3-03, 01-INS4-03 and 01-03. 

Example 53. Expression Profiling of Toxicological responses in Caenorhabditis departs using 
20 LNA Oligonucleotide Microarrays and beta-naphthoflavone and primaquine as model 
compounds. 

The present patent example demonstrates the use of the Exiqon C. elegans LNA tox oligoarray 
in gene expression profiling experiments in the nematode Caenorhabditis elegans. The G 
elegans tox oligoarray will monitor the expression of a selection of 1 10 genes relevant for 

25 general stress response and for the metabolism of toxic compounds. Two different capture 
probes for each of these target genes have been designed, and included on the LNA tox array. 
In addition, the G elegans LNA tox oligoarray contains capture probes providing control for 
cDNA synthesis efficiency and the developmental stage of the nematode. Capture probes for 
constitutively expressed genes for data set normalization is also included on the C elegans 

30 LNA tox oligoarray. 

A. Cultivation of C. elegans worms 

For all cultures the sample is divided into two, and one half of the sample is used as the control, 
the other as the treated sample. Worm samples are harvested and sucrose cleaned by standard 
methods. For heat shock treatment: the heat shock sample is added to S-media preheated to 
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33°C in a 1 L flask suspended in a water bath at 33°C, the other sample is added to a 1 L flask 
with S-media at 25°C. Both samples are shaken at approximately 100 rpm. for an hour. For [1- 
naphthoflavone and primaquine treatment: 0.5 mL of 5 mg/mL p-naphthoflavone in DMSO or 
0.5 mL of 20 mg/mL primaquine in DMSO were added to each 500 mL volume of S-media 
5 culture after 28 hours of growth from LI. At the same time 0.5 mL of DMSO was added to the 
control. Incubation was for 24 hours. Samples are then harvested by centrifugation at 3000xg 
suspended in KNALater™ (Ambion) and immediately frozen in liquid nitrogen. 

B. RNA extraction 

RNA was extracted from the worm samples using the FastRNA® Kit, GREEN (Q-BIO) 
10 essentially according to the suppliers* instructions. 

C. Design and synthesis of the LNA capture probes 

Capture probes were designed using an in-house developed software. Regions with unique 
mRNA sequence of the selected target genes were identified. The optimal 50-mer 
oligonucleotide sequences with respect to Tm, self-complementarity and secondary structure 
15 were selected. LNA modifications were incorporated to increase affinity and specificity. 

D. Printing of the LNA microarrays 

The microarrays were printed on Immobilizer™ MicroArray Slides (Exiqon, Denmark) using 
the MicroGrid II from Biorobotics (UK). The arrays were printed with a 10 nM capture probe 
solution. Two replicas of the capture probes were printed on each slide. 

20 E. Synthesis of fluorochrome labelled first strand cDNA from total RNA 

15 fig of C elegans total RNA was combined with 5 \ig oligo dT primer (T20VN) in an RNase 
free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC-water to 14.5 
|iL. The reaction mixture was heated at +70°C for 10 min., quenched on ice 5 min ; , spin 20 
seconds, followed by addition of 1 jiL SUPERase-In™ (20U/^lL, Ambion, USA), 6 ^L 

25 SxRTase buffer (Invitrogen, USA), 3 *lL 0. 1 M DTT (Invitrogen, USA), 1.5 |LiL dNTP {20mM 
dATP, dGTP, dTTP; 4 mM dCTP in DEPC-water, Amersham Biosciences, USA), and 3 |llL 
Cy3™-dCTP or Cy5™-dCTP (Amersham Biosciences, USA). First strand cDNA synthesis was 
carried out by adding 1 pL of Superscript™ II (Invitrogen, 200 U/mL), mixing and incubating 
the reaction mixture for 1 hour at 42°C An additional 1 \iL of Superscript™ II was added and 

30 the cDNA synthesis reaction mixture was incubated for an additional 1 hour at 42°C; the 
reaction was stopped by heating at 70°C for 5 min., and quenching on ice for 2 min. The RNA 
was hydrolyzed by adding 5 jxL of 1 M NaOH, and incubating at 70°C for 15 min. The samples 
were neutralized by adding 5 p,L of 1 M HC1, and purified by adding 450 |iL lxTE buffer, pH 
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7.5 to the neutralized sample and transferring the samples onto a Microcon-30 concentrator. 
The samples were centrifuged at 14000xg in a microcentrifuge for -8 min, the flow-through 
was discarded and the washing step was repeated twice by refilling the filter with 450 \i\ lxTE 
buffer and by spinning for -12 min. centrifugation was continued until the volume was reduced 
5 to about 5 fiL, and finally the labelled cDNA probe was eluted by inverting the Microcon-30 
tube and spinning at lOOOxg for 3 min. 
F. Synthesis of fluorochrome labelled cRNA from total RNA 

First and second strand cDNA syntheses were made using the MessageAmp™ aRNA Kit 
(Ambion) according to suppliers* instructions. Five microgram of C. elegans total RNA was 

10 used as template for cDNA syntheses. Syntheses of fluorescent cRNA were made according to 
the MessageAmp™ aRNA Kit (Ambion) protocol with minor modifications. Cy3™-UTP or 
Cy5™-UTP (6 fil.of a 5 mM solution Amersham Biosciences, USA) replaced biotin-CTP. The 
final concentration of ATP, CTP, and GTP was 7.5 mM whereas the concentration of UTP was 
reduced to 4.9 mM. 

15 G. Hybridization with fluorochrome-labeiled cDNA or cRNA 

The arrays were hybridized overnight using the following protocol. The Cy3™ and Cy5™- 
labelled cDNA or cRNA samples were combined in one tube followed by addition of 3 JIL 
20XSSC (3XSSC final), 0.5 flL 1 M HEPES, pH 7.0 (25 mM final), 25 \ig yeast tRNA (1.25 
p.g/)LiL final), 0.6 |iL 10% SDS (0.3% final), and DEPC-treated water to 20 jiL final volume. 

20 The labelled cDNA target sample was filtered in a Millipore 0.22 micron spin column 

according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by 
incubating the reaction at 100°C for 2 min. The sample was cooled at 20-25°C for 5 min. by 
spinning at max speed in a microcentrifuge. A LifterSHp (Erie Scientific Company, USA) was 
carefully placed on top of the microarray spotted on Immobilizer™ MicroArray Slide and the 

25 hybridization mixture was applied to the array from the side. An aliquot of 30 of 3xSSC was 
added to both ends of the hybridization chamber, and the Immobilizer™ MicroArray Slide was 
placed in the hybridization chamber. The chamber was sealed watertight and incubated at 65°C 
for 16-18 hours submerged in a water bath. After hybridisation, the slide was removed carefully 
from the hybridization chamber and washed using the following protocol. The Lifterslip 

30 coverslip was washed off in 2xSSC, pH 7.0 containing 0. 1% SDS at room temperature for 1 
min., followed by washing of the microarrays subsequently in l.OxSSC, pH 7.0 at room 
temperature for 1 min, and then in 0.2xSSC, pH 7.0 at room temperature for 1 min. Finally the 
slides were washed for 5 seconds in 0.05xSSC, pH 7.0. The slides were then dried by 
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centrifugation in a swinging bucket rotor at approximately 200 G for 2 min. The slide is now 
ready for scanning. 
H. Data analysis. 

Following washing and drying, the slides were scanned using a ScanArray 4000XL scanner 
5 (Perkin-Eimer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 
4.0 software package (Axon, USA). The data in each image was normalized so that the ratio of 
means of all of the features is equal to I. 

RESULTS 

10 Use of LNA-modified oligonucleotide capture probes in Exiqons C elegans LNA tox 

oligoarray clearly allows the identification of distinct expression profiles for C. elegans genes 
relevant for general stress response and for the metabolism of toxic compounds. 

Table 17. Expression profiling using LNA Oligonucleotide Microarrays. 
15 Log2 transformed fold of changes for selected genes in the two expression profiling 
experiments hybridised with cRNA target. 


Gene name 

Tox compound 


Primaquine 

beta-naphthoflavone 

ABC_C34G6.4 


1.01 

ABC_F57C12.4 

-1.11 


CYP_C03G6.15 


2.35 

CYP_C06B3.3 


2.47 

CYP_C49G7.8 


2.40 

CYP_F14F7.2 

-1 .24 

-1.03 

CYP_K07C6.5 


2.68 

CYP_K09D9.2 


2.14 

DC_W05G11.3 

1.16 


ER_26S 

-1.09 

-1.01 

HSP_C47E8.5 


1.17 

HSP_F26D10.3 


1.05 

HSP_F43D9.4 


1.27 

NAPJD2096.8 


1.14 

PPGB_F13D12.6 

1.08 

1.21 

RAD^Y116A8C,133 


1,13 

RPL_K11H12.2 

1.15 

1.42 

UbLF25B5.4 


1.37 
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Table 1 8. LNA-modified oligonucleotide capture probes. LNA modifications are depicted by 
uppercase letters in the sequence, mC denotes LNA methyl cytosine. 


Oligo Name 

CEABC_C34G6.4_u293_LNA3 
CEABC_C34G6.4_u375_LNA3 
CEABC_F57C12.4_u15_LNA3 
CEABC_F57C12.4_u480J.NA3 
CEABC_F57C12.5_u1 1 1_LNA3 
CEABC_F57C 1 2.5_u444_LNA3 
CEABCJ<08E7.9_d8_LNA3 
CEABC_K08E7.9_u51_LNA3 
CE ABC_Y39D8C. 1 _u37_LN A3 
CEABC_Y39D8C.1_u422_LNA3 
CEADH_H24K24.3a„d3_LNA3 
CEADH_H24K24.3a_u50_LNA3 
CEAPEX_R09B3. 1 _u1 9 1 _LNA3 
CEAPEX_R09B3.1_u37_LNA3 
CEAPO_C35D1 0.9_u1 5_LN A3 
CEAPCLC35D1 0.9_u609_LNA3 
CEAPO_C48D1 .2_u1 76_LNA3 
CEAPO_C48D1 2_u23_LNA3 
CEAPO_F20C5. 1 _u453 J.NA3 
CEAPO_F20C5. 1 _u96_LN A3 
CEATPase_B0365.3_u31_LNA3 


ence 


TgcmCatTgcAcgGgcActTgtTcgAtcTccTtcTgtTttActTttGgaTg 
TcaTtcTagGatTgcmCagAtgGttAtgAtamCtcAtgTcgGagAgaAagGa 
mCcaAtgTtgTttAatTggTtgTaaTgtmCttGatGacmCtgmCatAatmCatAt 
mCacAagAtcmCtgTgtTgtTctmCcgGaamCaaTgaAaaTgaActTagAtcmCa 
TacTtgTtcTcgAcaAagGttGtgTagmCcgAgtTtgAcamCtcmCgaAgaAa 
TgaActTggAtcmCctTctTtgmCatTtaGcgAtgAtcAaaTttGggAagmCg 
TcaTtaAttTtgTgtAgcTttmCttTctmCgaTttTtgmCacGatmCttTccmCc 
AggGtgmCctActAcaAacTgamCccAaaAgcAgaTgamCcgAgaAgaAatAa 
AttGaaAgcGacGcgGaaAgtGccAtgTatTtcTaaTttTgtTttmCttTa 
TtgTcaGcaTatmCaaGagTagAtaTggAagTggAtamCacTctGctAatmCc 
mCacmCttAttGcgTtcAatTttTgtTtcmCacmCtamCtamCtamCgaAtamCgtTg 
TcamCaaGggAgaGagTctGcgGtcGgtGctGgcGttmCgaGaaAatAtaAc 
mCatGcaTccmCgamCgaGaaGaaGtamCtcAttTtgGagTtaTctGgcGaaTt 
GacmCatGctmCcgGtcGtcAtgmCaaAtcGacTtcTaaAttGctTctGatTa 
TtgmCatGctGttAaaAccTatmCgtGtamCaaTatTgcmCtgTatAttmCccmCt 
TggmCacAgcTtaAtaAcaAatTggAaaGtcGagGatTagTcgGtgTtgAa 
GacAcamCgcAaaGgaTatGgaTgtTgtTgaGclGctGacTgaAgtmCaaTa 
AgcAcgAaamCtcTgcmCgtmCtaAaaTtcActmCgtGatTcaTtgmCccAatTg 
AtgGtcAtamCtcTaaAatGggmCagAacTtcAacmCaaAtcAttmCtcGtcAg 
AacmCcgAgcTtgmCcgmCaaAgtGcaAgaAaaTtaTagAacGaaTgaAacAg 
GgaTggGtcGagmCgtGagAccTacTacTaaAgaAcaGctTgtGaaTctTt 
mCaamCgtTctmCgaTtcmCtamCggAcaAgaAtgGacrnCtaTgcmCaamCagA 
CEATPase_B0365.3_u386_LNA3 aaGa 
CEATPase_C17H12.14_u356_LNA 

3 TgcTcgTtaTccAgcTatTttGaaGggActTgtmCatGcaAggActTctTc 
CEATPase_C 1 7H 1 2. 1 4_u89_LN A3 mCcgTttAgaGctTatTgcTaamCcaGatTgtmCccAcaAgtmCagAacAgcTc 
CEATPase_F55F3.3_u215_LNA3 TgamCggAcgmCtamCtamCccAtaTgtAttTgtTccAtcTtamCcaGcaAccAa 

AgcTacTtcAttmCgamCaaGgaAcaTctmCggAaaAgtmCaaGtamCatmCccG 
CEATPase_F55F3.3_u275_LNA3 g 

CEATPase_Y49A3A.2_u103_LNA3AaaTtcAagGatmCcaGttGccGatGgtGaaGccAagAttmCgcAagGatTa 

mCgaTcgTttmCtgmCccAttmCtamCaaGacTgtmCggTatGctmCaaGaaTatG 
CEATPase_Y49A3A.2_u272_LNA3 a 

CECALR_Y38A10A.5_u238_LNA3 TcaGgaAcgAtcTttGacAacAttAtcAtcAccGacTctGttGagGagGc 
CECALR_Y38A10A.5_u296_LNA3 TgaActmCtamCtcTtaTgaAagmCtgGggAgcmCatmCggAttmCgaTttGtgGc 
CEC AT_Y54G 1 1 A.5b_u 1 37JJM A3 GaamCttTgcAggGccGctmCggGgaAtgTcaTgaTttmCatTatTaaGggAa 
CEC AT_Y54G 1 1 A.5b_u 1 89_LN A3 GtcAatTctGggAgaAggTgtTggAtamCcgGggmCtcGggAgaGaaTgtGc 
CECC_C03D6.3_u275_LNA3 AtgTaaAgaAggAatGctTccmCgaAtgGatTggAtaTttAttTgtmCcaGa 
CECC_C03D6.3_u430_LNA3 GgamCcgAaaTttGtgmCagmCatGtcGgamCacGaaAttGatGgtmCtcAttTt 
CECC_C07G2.3_d9_LNA3 mCagAcamCgaAggTtamCgaTagAtaAccAtcTctmCaaAgtmCiaTcgAccTc 
CECC_C07G2.3_u44_LNA3 mCgamCgaTgtGcgTgtTccTgamCgaTgaAagAatGggAtaTtaAgaAaamCc 
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CECC_Y46G5A.2_u33 1 _LNA3 
CECC_Y46G5A.2_u385_LNA3 
CECoA_C29F3.1_u31 6_LNA3 
C ECo A.C29F3. 1 _u392JJMA3 
C ECo A_F08 A8.4_u 1 094_LN A3 
CECoA^F08 A8-4_u1 260_LNA3 

C ECo A.F59F4. 1 _u 1 09_LN A3 

C ECoA_F59F4. 1_u424_LN A3 

CECoA_Y25C1A.13_u1 15_LNA3 

CECoA_Y25C1 A. 1 3_u451_LNA3 

CECOL__C27H5.5__493_LNA3 

CECOLJ227H5.5_u680JJSIA3 

CECOQ_ZC395.2_u199_LNA3 

CECOQ_ZC395.2_u400_LNA3 

CECRYZ_F39B2.3_u171_LNA3 

CECRYZ_F39B2.3_u222_LNA3 

CECyclin_R02F2.1 a_u24_LNA3 
CECyclin^R02F2.1 a_u312_LNA3 
CECyclin_ZC 1 68.4_u203_LNA3 
CECyclin_2C1 68.4_u273_LNA3 
CECYP_B0213.15_u133_LNA3 
CEC YP_B02 13.1 5_u202_LN A3 
CECYP_B0304.3_u38_LNA3 
CECYP_B0304.3_u89_LNA3 
CECYP_C03G6. 1 4_u706_LN A3 
CECYP_C03G6. 1 4_u768_LN A3 
C ECYP_C03G6. 1 5_d9_LN A3 
CECYP_C03G6.15__u148_LNA3 
CECYP_C06B3.3_u1 02_LNA3 
CECYP_C06B3.3_U474_LNA3 
CECYP_C12D5.7_u399_LNA3 
CECYP_C1 2D5.7_u65_LNA3 
CECYP_C45H4.1 7_u27_LNA3 
CECYP_C45H4. 1 7_u598_LN A3 
CECYP_C45H4.2_.u1 1 0_LNA3 
CECYP_C45H4.2_u429_LNA3 
CECYP_C49C8.4_u363_LNA3 
CECYP_C49C8.4_u883_LNA3 
CECYP_C49G7.8_d6_LNA3 
CECYP_C49G7.8_u795_LNA3 
CECYP_F01 D5.9_u374 J.NA3 
CECYP_F01 D5.9_u46_LNA3 
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TtgTgcTccAtcGrtGctmCcgmCttAcaGacTtgAcaAcgmCtcAccTttGc 

AatGagmCggTtgTgcmCg^Gtp>\cgTcamC^Cgtm(^(^gtGttGctmaamCt 

AaaTtgAcamC^aAtcAaaTctGtcTcaTrtmCctGagGacmCgtmCaamCttmCg 

AatmCttTgtGtamCggAgaTggGgcAaaAggmCagmCaaGaaAgtAaamCcaAg 

AggAcaAggGgcActActGgcAcaGgcTttGatTatTgcAgtGagAtaTt 

TtaAtgGagGtgAcaAtgGgtTccTlgGatTcgAtaAatTccGagTgcmCc 

GctmCmmCtcmCagTggGctmCaaAatAgtrnCaamCtcAacAgaTcgGaaGttmC 

t 

AaaGctTcgAgaTggmCacGttmCgtmCtgTatmCtcGtgAagAacTtaTtgmCa 

GatTcgmCtgAacTtlAtcAagAcgTggAatAtgAgcmCagmCtcmCtgTcgAc 

GatmCttAtcAccGcgTgcGatAttmCgaGtaGctTcamCagGatGcgAttTt 

GgaAagGaaGgaTccAttmCtcAgcTctGcamCttmCcamCcaTcaGagmCcaTg 

TggAtamCaaGgaGggAlcTggmCagTggTggAtcTggAagTggTggAtaTg 

TtgAaaGaamCtcmCttGccGacGatmCctGaaAcamCacAaaGaaTtgmCtgAa 

AtgTggGatGagGagAaaGaamCatTtaGatAcaAtgGaaAgaTtaGctGc 

AggmCtgAgcTctTggActTtgGcaTcaAcaTtgTctmCatTctTgaAggAa 

TtaTggTtamCagAagGagmCtgTltAcgGtgTagmCatTggGaaTgtmCttmCc 

mCacTtcAacmCaamCtcmCgtGttAatmCaaGcaAgcmCgcmCacmCatmCta 

AtgAg 

TctmCatTgcTcgTcgAggmCtamCcaAcaAacActGgcAatAccmCaaTtaAt 

TaaGaaAgtm CatTgaGgaTgcTgtmCgcTttGctm Cgcm Cg aAgtmCtcG taTa 

AagTtcAtcmCtgTtgAcgGaaTcgAggmCggAgaAtgmCtgTatmCggTcaTt 

AcaGgaAatAtgAttTtgGatTtcGatTttGaaTcgGttGgtGctGccmCc 

GctGagmCtgTatTtgGctAgtGaaAtgTgtGttTttGatActTtaAatGa 

AcgAggTttGgaTcamCaaTcaGaaTtcTgtGaaAtaAgcGttTttTggGa 

AgtTctmCggTctAacAgtGtcTccmCgtTgaAtaTtcTtgTaaAatmCacAc 

AtgAccActmCaaAatActGctAaaAgaTttGcaGcgGcaGaaGccGttAa 

TtgAtaTggmCtgTacmCtgTatGgtTttTgaGgamCgtTttTtaGgaGtcGa 

AttTatTcaTtcAtcmCatGtaAacTgtAtaTttTgaAttTgtGttGiaAa 

GccAaaGcaG aaTtgTatTtg AtcTtcGgtAacm Cttm CtcmCttm CgcTacAa 

AttTtgAatmCttmCtgGgaAaaTgcmCatmCcamCtcGagAaamCcgTtcmCgtTt 

mCtaAcgGagGatmCtcGccAatTatmCttTgaGagAcaAaamCtgAaamCtcmCt 

AtcTagTccmCaaTgaAtcTccmCacAtgmCtgTtamCtcGtgAtgTtcAacTc 

TttTgcTttmCatmCgcAaaAgcTcaAgaTtamCacAtgTcaGgtmCaaGccAa 

mCcgmCgamCttTaaAgaGaaGatmCatAaaTttGcaTtgTttTttGttTgtAt 

mCgaGggTgaTtcGgaGacTttmCagTaaTgtmCcaActTtcAaaTgtTtgmCa 

TagAtamCaaGatAcaTccmCtcAaaAgaAggmCctAccGtcAatGgcmCaaAg 

TcaAcgmCgtmCtaTaaAtgAalmCacAacGagGtaTcaAcaTtcTccmCccTg 

AtgmCtgAtgTtgAaaTtgmCtgGctAccGtaTtcmCaaAagAtamCtgTaaTc 

AtgAatmCcaTggmCttGgamCatmCtcmCcgTttTtcAagGgaTatAaaAatGt 

AtgrnCaamCgaAttAgtGaaAaaTtcAtcmCtgGaaTaaAaaAtaAttmCtaAa 

AtcGctAcgAcaAtcTttmCcgAtgmCctTcgAagTrtmCgaAagmCttTctmCt 

GagGtcGgtGgaGgaGgaAgtGgaAatTgamCggmCaaAatmCctGccmCaaGg 

mCccTctTtgGgaTttm(>»mCtcAagTttActGttmCggmCagmCagTgaTalAa 
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CECYP_F08F3.7_u25_LNA3 

CECYP_F08F3.7_u401_LNA3 

CECYPJ r 14F7.2_u397_LNA3 

CECYP_F14F7.2_u68_.LNA3 

CECYP_F42A9.5_u435_LNA3 

CECYP_F42A9.5_u55_LNA3 

CECYP_K07C6.3_u3_LNA3 

CECYP_K07C6.3_u354_LNA3 

CECYP_K07C6.4_u1 18_LNA3 

CECYP_K07C6.4_u87_LNA3 
CECYP_K07C6.5_u7_LNA3 
CECYP_K07C6.5_u99_LNA3 
CECYP_K09A1 1 .3_u362_LNA3 

CECYP_K09A1 1 .3_u48 JLNA3 
CECYP_K09A1 1 4_u238_LNA3 
CECYP_K09A1 1 .4_u68_LNA3 
CECYP_K09D9.2_u151_LNA3 
CECYP_K09D9.2_u866_LNA3 
CECYP_T1 0B9. 1 0_u41 0_LNA 
CECYP_T1 0B9. 1 0_u56_LNA 
CECYPJT1 0B9.7_u 1 02_LNA3 
CECYP_T 1 0B9. 7_u267_LN A3 
CECYP_T 1 9B1 0. 1 _u1 00_LNA3 
CECYP_T1 9B1 0. 1 _u3 1 9_LN A3 
CECYP_Y49C4A.9_u121_LNA3 
CECYP_Y49C4A.9_u41 3_LNA3 
CECYP.ZK1 77.5_u394_LNA3 
CECYP_ZK1 77.5_u445_LNA3 
CED AO_C47 A1 0.5_d9_LN A3 
CEDAO_C47A1 0.5_u269_LNA3 
CEDC_C01 A2.3_u373_LNA3 
CEDC_C01 A2.3_u96_LN A4 
CEDC_C34 F6. 1 _u301 _LN A3 
CEDC_C34F6.1_u450_LNA3 
CEDC_F33D1 1 .3_u126_LNA3 
CEDC_F33D1 1 .3_u14_LNA3 

CEDC_F46E 1 0.2_u392_LN A3 
CEDC_F46E10.2_li54J_NA3 
CEDC_F56G4.2_u382_LNA3 
CEDC_F56G4.2_u82_LNA3 
CEDC_M 1 62 .2_u 1 03J.N A3 
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GagTtgGttmCcamCagAatGctTagGacGttTaaAttmCgtmCacAaamCttTt 

mCaaTatGgtT(XinCalTttAgcAacTcaTatGaarnCacAgaAgaTgtmCctTg 

GaaAaaGgcGtcGacAttTtaTgtGacAcgTggAcamCttmCacTatGacAa 

TaaTtgAatTacGggTctTttGtamCatAttAatTttAgtAtamCttTgtGa 

AtaTcaAtgmCaamCtaTlaAtgAatmCacAacGtcTtgmCcaAtcTtcTccmCg 

GgaGtgActAtgAaaGcaAagAgtTacmCgaTtgAaamCtgAaaGacAgamCa 

AatmCttTaaTgaTaaTttAtgGgaTctGlaTttmCtcTttmCtgTcaAtaAa 

AtgAgcmCcamCaaAtgTaaAagGatAcgAgaTtgAttfnCggGaamCagTcaTg 

AtcmCtgmCgaTatGacAttAagmCcamCatGgtTctGaamCctTcaAcaGaaGa 

mCtgAacmCltmCaamCagAagAtaAacTtcmCgtAtaGcgmCtgGaaAaamCtc 

mCt 

AttTaaAgg AatTcamCagmCtcAaaAaaTaaTaam Ctam CcgGttmCag AgaTt 

AatTtgAgcmCacAtgGcaAgtTatmCaamCagAggAgamCaaTgcmCgtAcaGt 

TgamCatTctActTaaAggGaaGaaAatAccAacTggTacmCctTgtAttTg 

TcamCc^mCaaAgcmCatAcaTatGcgAgcTagTtcmCtcAggmCtgmCMAaam 

Cc 

TtcGacAaaActAttTtgGaaAgaAcaAtcmCcaTtcAgtGtcGgcAaamCg 

TctGac Aac Aaa Gcc AtamCacGtgm CcgActAatTccAcaAtcAgcTag Aa 

TtgGcaAaaGcaGaaTtgTatTtaAtcTttGgaAacmCtcmCttmCttmCgcTa 

TgaAtcTttmCaaActTatmCacTccTttTaaTacTacmCgtTccTgtTtgGa 

AttGagAnGtaTccAtlGgcGtcTctTgtTcamCaaTcgAaaAtgTctmCa 

AacTgcTacTatTgcGccAtcAagTgtGctGctmCaaActTaaAtcmCagGt 

TtgAgamCagGaaAtaAgamCtaGaaTtcmCtlTgaAacTggTggGaaGtgmCt 

AagAtgTcaAagAatTcaAgcmCagAacGatGgtmCcamCcgAcgAgcmCatTa 

AttGaamCcaActmCtgAaaTatAatGacAcaAaamCcaTgtmCtgGaaGtgGt 

GgcAatGtgAcaAtaTctmCcaAtgGttmCttmCacAgcAatmCatmCacGtgTt 

mCtaTtc AatmCgaTatTttAtcAca mCcaTccAgtGctGgam CctmCcaTcaTt 

GtcTcaGagAtgTgtAaaTttActTccmCtgmCaaTttGttTcamCgcAacTa 

TtcmCgaAtgTttmCcaAttGggActGaaGttTcaAgaGtcAccmCagAaaAa 

GatmCcaGcaTctTccAagmCttAcaTtcmCtcmCgtGctTgtAtcAagGaaAc 

TttGaaAacrnCtgTttTatTatTaaAatAgaTaaTtgAttAgtTctGtannCg 

AtamCgtTgcActGcaTccGgcTatGagGgaGccAaaAatmCttAggGgaGt 

GcamCttmCcaTtcAtcTctGcaGctActAtgGctTtgGtgAcaAaaGttGg 

mCcgTccAaaAgaAtgmCcaTctrnCacAagTctTgaAatmCltAtaAagGtaGt 

GagGgaTcaAcaGtaAccTcgTgcGgtAtlGacAagGgaTgtmCcgGaaGg 

GatGgtTctTcgAtcGcaAacAaaAcaGatGtgmCtcmCatTtamCatAcgGa 

AtgGagAaaAtgGatmCtgAtgGagTtgmCagGaaGtgAtgGagmCtcmCagGa 

TgaAtcTccAtaAatTatTcaAtgTttmCcaAatAttTaaTttAtcAatTg 

GctmCaamCacGgtAggAtcmCtaTggAacmCgtmCggAggAgcAggmCctmCg 

gAg 

mCgtGacAacmCtcTtaTttAttTclGtaAaamCtgAttmCgcmCaaActTttGt 
GaaGctTtcAaamCcaAatGagTtcmCttmCccGgaAtcmCcaAagAatAccAa 
AcaAtgAaaAgaGagGatGgaAagGaaAtcGaaGtcTctGttmCttGacGa 
GatGagGtamCatAacTttGtgTgcAgtTalAggmCcaTctAcaGtamCctGc 


LNA2I discriminating probes SKA/MSL 


5/14/2003 


CEDC_M1 62.2_u480J_NA3 
CEDC_R10E4.1 1_ji274_LNA3 
CEDC_R10E4.1 1_u397_LNA3 
CEDC_T04C9. 1 _u321_LNA3 
CEDCJT04C9. 1 __u64_LNA3 

CEDC_W02A2.3_u32_LNA3 

CEDC_W02A2.3_u374_LNA3 
CEDC_W05G1 1 .3_u1 53_LN A3 
CEDCJ/V05G1 1.3_u51_LNA3 
CEDC__ZK863.5_u256_LNA3 
CEDC_ZK863.5_u324_LNA3 
CEEPHX_Y55B1 BR.4_U161_LNA3 
CEEPHX_Y55B1 BR.4_u93_LN A3 
CEER_1 8S_u388_LNA3 
CEER_18S_u82J-NA3 
CEER_26S_u342_LNA3 
CEER_26S_u38J_NA3 
CEFOXO_R 1 3H8. 1 b_u33 1 _LN A3 

CEFOXO^R 1 3H8. 1 b_u393_LN A3 

CEGAPDH_K1 0B3.7_u21 _LNA3 
CEG APDH _K1 0B3.7_u727_LNA3 
CEGBA_F1 1 E6.1a_u232_LNA3 
CEGBA_F1 1 E6. 1 a_u451 _LNA3 
CEGLLLC02A12.1_u264_LNA3 
CEGLU_C02A12.1_u55J.NA3 
CEGLU_C46F1 1 .2_u271 _LNA3 
CEGLU_C46F1 1 .2_u45_LNA3 
CEGLU_F26E4.12_u109J.NA3 
CEGLU_F26E4. 1 2__u480 J.N A3 
CEGLILR07B1 .4_u166_LNA3 
CEGLU_R07B1 .4_u38_LNA3 
CEGLU_T09A12.2_u220J.NA3 
CEGLLLT09A1 2.2_u335J.NA3 
CEG LU_T28A1 1.11 _u299 J.NA3 
CEGLU_T28A1 1 .1 1 _u54_LNA3 
CEGPD_B0035.5_u256_LNA3 
CEGPD_B0035.5_U478_LNA3 

CEHS P_C09 B8.6_d8_LN A3 
CEHSP_C09B8.6_u286_LNA3 
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TtcmCatmCatmCacTaamCcgAttGtcmCtgAcaTtgAtgGccAaamCcaGggAa 

TcamCatTatmCgaAcaAgtActAgtAagmCatGctGtgAtgGagTgcmCgcTa 

mCacGgaGatmCacGacAtcAaaGcgGatTgcTtaGagTgtGgaAacmCgtmCt 

ActAtcTacGtgGcafnCgtTggActmCalmCatmCgaTggGaamCgamCgtAtaAg 

TctmCtgGccAgtTwmCttTglGatmCaaTctmCagAttmCgtniCcamCacAagAt 

mCtamCttmCk^mCaaGaaGgcmCcgTcgTttmClaAtcGatmC^aAcaTc^ 

Ac 

AtgGatGatmCgamCccActTgcmCacTgamCccAcaAtcmCcgmCacTcamCta 
mCc 

AagAcgGagAggmCtgGagAgaAcgGtamCcgAtgGagAgcmCagGaamCtgAt 

mCcamCccAggAggAggGatAcaAgaGaaGaaAgtAcaGatTctmCcaActAa 

AgtTt^camCttmCttTtlGccGttTtgGtlmCccGttAtcAalmCcaTtgAt 

mCttTtaTatTctmCatmCaaTttGttTccTacTtgGtcAgcTgaGgaTcgTt 

TtcGgcAcaAatGgaGcaAaaGtaTcgTggTtaTtgTgaTgcGatTatTc 

mCtamCtaTgaAtgAgcTcamCtgGacTcaTttAtcAacTcgAgtmCaaAagmCc 

GttGgcGaaTctTcgGgtTcgTatAacTtcTtaGagGgaTaaGcgGtgTt 

GaamCtgAttmCgaGaaGagTggGgamCtgTcgmCttmCgaGgtTtaAcgActTc 

TgtTatTgcGaaAgtAatmCctGctTagTacGagAggAacAgcGggTtcAa 

TgcAtamCgamCttGgtmCtcTtgGtcAagGtgTtgTatTcaGtaGagmCagTc 

TgtGctmCagAatmCcamCttmCttiTiCgaAatmCcaAttGtgmCcaAgcActAacTI 

TtaAgamCgg AacmCaaTtg m CtcmCacmCacm CatrnCalAccAcg AgtTgaAca 

Gt 

AcaTtgmCtamCcaAggmCctAagmCcgmCttnriCaaAttmCtcTaaGtcTgaAatG 
a 

GttGagTccAccGgaGtcTtcAccAccAtcGagAagGccAatGctmCacTt 

AgtAaaTtcmCttmCcamCgtGgaTctActmCgtGtgTtcAcaAagAtcGagGg 

GgtmCcaAtaAtgGgaGacTggTtcmCgcGcaGaaAgtTatGcaGatGatAt 

AgaAaamCttmCgtTggAccmCtgmCtaAggAgaAgtAttTcaAgcTtcTgaGc 

GagmCacmCcgAagmCtcAagmCcaTatTtgGaaAcaAgamCcaTacTctTcaAa 

GttAccmCtcTacAaaTctmCgcTtcAatmCcaAtgTtgTtcGcaGtcAccAa 

mCcgAagAgcTcgTtamCtaTgcGagGagGtgTgaAgcmCggAatAatTttTt 

AagTtcTlgGttGgamCgcGatGggAaaAttAtcAagAgaTttGgamCcaAc 

AcgAttTcaAcgTcaAaaAtgmCtaAtgGtgAtgAcgTgtmCacTttmCggAt 

AccTggGttGatGttTttGcgGctGaaAgtTtcTccAagmCtcAttGatTa 

GaaGtamCgtrnCicmCcaAagAaaAgcTacmCccAgcTtaAggmCatTgcAcaAt 

GcgmCcaGatAtgTatTcaAagAtcGagGtaAatGgtmCagAacAdmCatmCc 

AatmCtamCagGgaAaaAggAttTcgAgtTgcmCgcGttTccAtgmCaaTcaAt 

AgaTggmCaaAgaAgcAtamCalAacTgaAacTctTccmCggGgaGctActAc 

TgaAtaAacGggmCcgAacTaaAtcmCatTcgTcaGtgGaaAtgGgaAacAa 

GtcmCgtmCttmCctGatGctTatGaamCgcmCtaTttmCtcGaaGtaTtcAtgGg 

TgtGgaAaaGctmCtcAacGagAagAaaGcaGaaGttmCgtAtamCaaTtcAa 

AtaTcgm CcgmCctGctTccTcarnCcaAccm CgaAtaAcgmCaamCaaAaam Ctt 

Ta 

AagAgcmCcamCtcAtcAagGatGaaAgtGatGgaAagActmCttmCgtmCtcAg 


LNA2I discriminating probes SKA/MSL 


5/14/2003 


C EHSP.C1 2C8. 1 _u 1 27.LN A3 
CEHSP_C12C8.1_u1531_LNA3 
CEHSP_C47E8.5_u310_LNA3 
CEHSP_C47E8.5_u361_LNA3 
CEHSP.F26D1 0.3_u276_LNA3 
CEHSP.F26D1 0.3_u397_LNA3 
CEHSP_F43D9.4_u1 69_LNA3 

CEHSP_F43D9.4_u275_LNA3 

CEHSP_F44E5.4Z5_U123.LNA3 
CEHSP_F44E5.4/5_u380_LNA3 
CEHSP_F52E1 .7_u175_LNA3 
CEHSP.F52E1 ,7_u448_LNA3 
CEHSP_F54D5.8_u252,LNA3 
CEHSP_F54D5.8_u318_LNA3 
CEHUS_H26D21.1_u1 17.LNA3 
CEHUS_H26D2 1 . 1 _u478_LNA3 
CEMR E_ZC302 . 1 _u1 69_LNA3 
CE MR E_ZC302 . 1 _u292_LNA3 
CEIVn"L_T08G5. 1 0_d1 27_LNA3 
CEMTL.T08G5. 1 0_u45_LN A3 
CENAP_D2096.8_u356_LNA3 
CENAP_D2096.8_u70_LNA3 
CEPAI_F56D12.5_u241_LNA3 
CEPAI_F56D12.5_u301_LNA3 
CEPDI_C07A12.4_u28_LNA3 
CEPDI_C07A12.4_u433JJNA3 

CEPDI_C14B1.1_u119_LNA3 
CEPDI_C14B1.1_U358_LNA3 
CEPGK JT03F1 .3_d9_LNA3 
CEPGK_T03F1 .3_u424_LNA3 
CEPON_E01 A2.7_u223_LNA3 
CEPON_E01 A2.7_u79_LNA3 

CEPPGB_F13D12.6_u44_LNA3 
CEPPGB_F1 3D 1 2.6_u440_LNA3 

C E PPS_T 1 4G 1 0. 1 __d2_LN A3 
C E PPS_T1 4G 1 0.1 _u240_LN A3 
CEPRDX_R07E5.2_u405_LNA3 
CEPRDX_R07E5.2_u42_LNA3 
CEP YC.D2023.2_u256.LN A3 
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mCaaGatAttTtaAcaAaaAtgmCatmCaamCaaGaaGccmCaaTcaGgtTccGg 

mCttGggmCalTctGtamCggGatGctGtcAttActGtgrnCctGcaTatTttAa 

AagAagmCatmCtcGaaAlcAacmCcaGacmCacGctAtcAtgAagAcamCttrnCg 

AtgAaaGctmCaaGctmCltmCgtGatTccTclActAtgGgaTacAtgGccGc 

TtaAgcAgamCcaTtgAggAcgAgaAgcTcaAggAtaAgaTcaGccmCagAa 

mCgtmCttTccAagGatGacAttGaamCgcAtgGtcAacGaaGctGagAaaTa 

GtcGacTtgGctmCacAlcmCacAccGtcAtcAacAagGaaGgamCagAtgAc 

mCaaTctTgaGggAcamCgtTctmCacmCatTgaGggAcamCcamCgaGgtmCa 

aGa 

TcamCtaAaaTgcAccAatrriCtgGacAatmC^mCtgmCttmCtgnriCtgGalGcgmC 
1 

TcaTgaAgcTaaAcaAttmCgaAaaGgaAgaTggTgaAcaAcgGgaAcgTg 

AagTatAacmCttmCcaAcaG ggGtc mCgtm CcaGaamCaaAtcAagTccGaaTt 

TttAacmCatGgcmCgcAgaTtcTtcGatGacGtcGacTttGatmCgcmCacAt 

GcgTcgAaaAgaTctmCccTgaAgtmCtgmCatTgamCtgGccTtgAtaTtaTg 

AcaTagTctTcgTcaTcaAggAtaAgcm CacAccmCgaAatTcaAgcG agAg 

TcgmCcaAcamCtcGgamCacGlgmCcaAaaTgaAtaTcaTctmCaaAtcGaaTg 

GtcGaaGttAgaAatmCcaGaaGccGatAttGnTctmCatmCaaAttmCcaAt 

ActAclmCgtGgaAgaTccAalAaaGttGttTcaAcgmCgamCaaAtcGatTc 

GgcAgtGaaGatGaaGtgGcaAatTctGatGaaGaaAtgGgaAgcAgtAt 

TtgTcaAcgAccAgaAgcAaaAatTatGggAatmCgcGatAaaAttmCaaGg 

GatGcaAgtGtgmCcaAclGcgAatGtgmCtcAggmCtgmCtcAttAatTtgAa 

GacGatAlgTtcGatTtcmCcaGgaGagGacGglGatGatGtgTcaGacTt 

GacGatAtgTtcGalTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt 

GagGtcGtcGtaAtcmCacAagGctmCcaAgaAagmCaaGtgmCtcGacAttTc 

GatActTttGgcAagmCtcGttmCcaAtcAagAagGagGtcAtcmCcaGatmCg 

GatGagGagGgamCacAccGagmCtcTaaAtcmCacAttmCcaAtamCagTtcAa 

mCttAtgTccGaaGatAtcmCcaGagGatTggGacAagAacmCcaGtcAagAt 

TacmCk:cAgtm(^amCtaTgaTggAgamCagAaamCctmCgaGaaGttmCgaAg 

aAt 

mClcGtcGccTccAacTtcAacGaaAttGccmCttGatGaaAccAagActGt 

TtcTatTgtTtaTtcmCttGccmCaaTagTgtAttTgtAttTatTctTtcTc 

mCaaAtcmCatmCtcmCcaGtgGatTtcGtcAttGctGacAagTtcGccGagGa 

GtlTctGatTcgAcamCttTatGgamCcaTctmCaaGttmCtgmCgaGttTctTt 

GggAaamCaaAtgAttGttGglAcaGtaGccmCgcmCctGctAttmCacTgtGa 

mCgaGcamCalmCatmCcaAtcGttm^GttmCaamCaaGgcmCttmCtaAtcGtl 

Ag 

TgaTgaGagmCccAgtAacmCaaTtaTttGaamCcgTcaGgaTgtGcgTaaGg 

mCgtmCtaAtcGaaGaaGggGatmCgtGggmCaaTcaTaamCtaAttAacmCttm 

Ca 

m CaaTggmCtcm CagGtcTttmCtgm CtcTtc AtaTacTtcmCatTccGagTtgmCt 
GttmC^cTtgGagmC^gAagTtgTcgmCgtGctmCgtGtgAttmCtcActTctniCt 
TcgmCtamCcaGcaAggAatActTcaAcaAggTcaAcaAgtGatmCacAcaGa 
AagGaaAttGtaActmCgcm(^aAgaGctmCtcmCcaGgtGtcnriCg1GgamCatAt 


LNA2I discriminating probes SKA/MSL 


5/14/2003 


CEPYC_D2023.2_u427_LNA3 
CERAD_F1 0G7.4__u1 69_LNA3 

CERAD__F1 0G7.4_u267_LNA3 
CERAD_F32A1 1 .2_u250_LNA3 
CERAD_F32A1 1 .2_u380_LNA3 
CERAD_T04H1 .4_u274_LNA3 
CERAD_T04H1 .4_u375_LNA3 
CERAD_W06D4.6_u325_LNA3 
CERAD_W06D4.6_u34_LNA3 
CERADJV1 16A8C.1 3_u289_LNA3 
CERAD_Y 1 1 6A8C. 1 3_u59_LN A3 
CERAD_Y39A1A.23_u221_LNA3 
CERAD_Y39A1 A.23_u276 J.NA3 
CERAD_Y41 C4 A. 1 4_u509_LNA3 

CER AD_Y41 C4A. 1 4_u731 _LNA3 
CERAD_Y43C5A.6_u 1 3 1 _LNA3 
CERAD_Y43C5A.6_U429_LNA3 
CERFC_F3 1 E3.3_u1 28_LN A3 
CERFC_F31 E3.3_u55J.NA3 
CERPL_K1 1H12.2_d1 JLNA3 

CERPLJO 1H12.2_u172.LNA3 
CERT_F36A4.7_u1396_LNA3 
CERT_F36A4.7_u2302_LNA3 
CERT_F36A4.7_u289J_NA3 
CERT_F36A4.7_u291 9_LNA3 
CERT_F36A4.7_u4269JLNA3 
CERT_F36A4.7_u5485_LNA3 
CESLC_F52F12.1 a_u249_LNA3 

CESLC_F52F1 2. 1 a_u76_LNA3 
CESLC.K1 1 G9.5_u400_LNA3 
CESLC_K1 1 G9.5_u462_LNA3 
CESLC_Y32F6B. 1 _u 1 79_LN A3 
CESLC_Y32F6B.1_u280_LNA3 
CESLC_Y37A 1 C. 1 a_u 1 04_LN A3 
CESLC_Y37A1 C. 1 a_u404_LNA3 
C ES LC_Y70G 1 0 A.3_u383_LN A3 
CESLC_W0G10A.3.__u46_LNA3 
CESOD_Cl 5F1 .7_u435_LNA3 
CESOD_C1 5F1 .7_u9_LNA3 
CESOD_F10D1 1 .1_u326_LNA3 
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TtgActGgaTtgGagAttGcgGaaGaaGttGatGttGaaAtcGagAgtGg 
GocAagTctmCaaGcaAtaAgtGttGatfnCaaTcaGagmCcaTacGgaGagAt 
AtaTtgAgamCttmCggGacAagmCggActTctmCatmCtgTcamCagmCaamCtg 
mCc 

GatmCcgmCagAgaAtcGagTatTtcmCtcTcgAgamCccAtgGatAtcAacTg 

TccGttAagAagmCtcActGgaAaaAcamCacGgcTcgAacGaaAnGgaAt 

AatTtgGatGagAgcAaaGtgGaaGgaAtgGctAtcGttTtgGcaGatAt 

GtgmCtgGtcAaaAaaTgcTtgmCtlmCgtTgcTtaTtcGcaTtgmCacTcgmCa 

nfiCttrnCgaGaamCtcTtcAagTtgGaaTcaAcaGtgGcaTcgGatAcamCatGa 

GtgmCctTctGaaGccGaaGaaAacGacGatTagTtaAatGttTccAagTt 

GatAaaAtcGatAgcGacGacGatGagGaaGccGatGatGagGagmCtcGa 

GcaGgtGgaTacGgaTgtGgaGctGacTttTgcGtlTtaTcaAgaAtcTc 

TccmCgtAgaAgtAgaAatGctAgaAgaAccTgaAcaAgaAgaTcaAgaAa 

TgcAagAtgTcaGtaTtgAaamCaaTtcmCtgTagAgamCccmCcgAagAaaAt 

AgtmCtcGtaTccGggAatGttTcaGccTgtGaaAatGctTgtTgaAgamCg 

mCttmCaaAacmCgtmCgcTttTaaGgaTacAggAacGtgGcamCgcTtcrnCgaG 

g 

mCagAttGtamCctTcgAaaAggAaaAggAgaGaaTcgmCgtnriCgcAaaAatGg 

TgaTggmCltTgaTtaTtcGagmCagGagmCaaTgaTgtmCcgAgaGtcGttAt 

mCaaTgamCgaGaaTatTggAgtAatGggGaaActGgtTgcGacTtgmCgaAa 

TtgGaaAacAatmCncmCtcGacTttmCtgmCtcActmCttmCgtGaaActAtcmCa 

TctTgtTatTttAttTtgTttTggGctTgtTccGaaAatGaaAtgGttGt 

mCaaTggAtcAcc AagmCcaGttmCacAagm Cac mCgtGagmCaa AgaG gamCt 

cAc 

mCttTgtGatGtgAtgActGcgAagGgamCacTtgAtgGctAttAcgAgamCa 

G agmCcaGctActmCag Atg AcamCtc AacAcgTtcmCatTatGca GgaGttTc 

TacActmCcaTccTcgmCcgAcaTacAatmCcaAcaTctmCcamCgcGgaTtcTc 

AtgGagAagAtgGttTggAtgGaaTgtGggTtgAgaAtcAgaAtaTgcmCg 

AacmCggGatAccGtgTcgAacGtcAcaTgaAagAtgGcgAtaTaaTcgTc 

GagGagAttAaamCgcAtgTcaGtgGctmCatGtcGagTttmCcaGaaGtcTa 

AgaTatTgcmCtcTacTtaTcaTggGccTgaTggmCttTgtmCtgmCcgGtaTt 

GaaTctmCaamCca m CttmC tgG aamCcc mCalAcamCca AtgG at AgaAgamC 

ggAg 

GttGttmCttTttTccGtgAtcTttTcaTgtTtaTgtmCtgAacGtgGcaGg 

GacTcgTtgGtgTctTgcTagG atGtcTtgGgtTcaTtcm CtcAatmCgtTg 

GtamCtgGgcTcgAggGctGaaActAalmCgaAgaAgaAacTccAgaAgaTa 

GgaTcaTgcTctGttTacGacActGatGagTtaAgaGtcAgamCtgmCacGt 

mCgaTggTtcTtcTcgTctAtcAtaTcgGggTagTtgmCcgAagTgtTgaAa 

mCaaAtcGaamCtgGtaTaaAggAggAccGacGgaGacGaaTttGaamCgaGa 

AttmCgaTcaAagAacTctGgcTctmCggmCgtTaamCtgGacAttTgtTcgTc 

mCtcmCccGagmCagGcgAttAttmCacGctAgtTatGctmCaaAtgTgaTctGt 

mCcgGtamCtaTctGgaTcamCacAgaAgtmCcgAaaAtgAccAggmCagTtaTt 

mCccAgtGacTacmCtgAatmCgcGtcTctGaaTctmCcamCacAatTccTacTa 

GgaGttGctmCacmCgcAatTaaGagmCgamCttmCggAlcTctGgaTaaTctTc 


LNA2 1 discriminating probes SKA/MSL 


5/14/2003 


CESOD_F10D1 1.1_u477_LNA3 

CESULT_EEED8.2_u31 6_LNA3 
CESULT_EEED8.2_u82_LNA3 
CESULT_Y1 13G7A.1 1_u252JJMA 
3 

CESULT.Y1 13G7A.1 1_u96_LNA3 
CESULT.Y67A1 0A.4_u1 08_LNA3 
CESULT_Y67A1 0A.4_u327_LNA3 
CETO PO_K1 2D1 2. 1 _u398_LNA3 
CETO PCLK1 2D 1 2. 1 _u449_LNA3 
CETOPO^MOI E5.5b_u256_LNA3 
CETOPO_M01 E5.5b_u429_LNA3 
CEUbi_F25B5.4_u186_LNA3 
CEUbLF25B5.4_u2_LNA3 
CEUbi_F29B9.6_u1 45.LNA3 
CEUbLF29B9.6_u230_LNA3 
CEUbi_M7. 1 _u239_LNA3 
CEUbi_M7. 1 _u53_LNA3 
CE U GT_F39G3. 1 _u40_LNA3 
CEUGT_F39G3. 1 _u466_LNA3 
CEUGT_M88.1_u480_LNA3 
CEUGT_M88.1_u72_LNA3 
YAL0O9W_u 1 45_LN A3 
YAL0O9W_.u341_.LNA3 
YAL059W_u262_LNA3 
YAL059W_u51_LNA3 
YER109C^u109_LNA3 
YER1 09C_u436 J.NA3 
YHR1 52W_u1 28_LNA3 

YHR152W_u510J_NA3 
YKL130C_u211_LNA3 
YKL130C_u85_LNA3 
YKL1 78C,u 1 99_LN A3 
YKL178C_u367_LNA3 

YLR443W_Ul 79_LNA3 

YLR443W_u86_LNA3 
YOR092W_u251_LNA3 
YOR092W_u82_LNA3 
YPL263C_u132JJMA3 
YPL263CLu257_LNA3 
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AaaTtgAggAaaAgcTtcAcgAggmCggTctmCcaAagGaaAcgTcaAagAa 

mCaaTcgTacmCatGaaAgaAgtTggAagmCcamCgtGcaAgaGaaGaaAtcmC 

a 

AagAagAttmCctGacmCagAgaGacTcamCgtGctTacmCcaAgaAgcAtcTa 

AgcAttGgtGgaAatAcgAaaTggmCatGggAagAgaAacmCccTctmCaaTt 

mCtgGttAcgGtaGtgTatGgtmCccTgtmCctrnCtcAgaAtgmCaaAtaTgtmCg 

TctAcgTcgAtgGaaAagmCcgAttTaamCaaTcaAagmCk^AcaAcgmCagTt 

GgaAagGtgmCcaAaaAgtTgamCagmCaaTtgGagGatmCttAttmCatTgcfnCa 

AgaTgaTgaTgaAgtTccTgcAaaGaaGccTgcTccAgcGaaGaaAgcTg 

AaaAccTcgTacTggAaaAggAgcTgcGaaAgcGgaAgtTatmCgaTttGt 

GagAagGccmCagAagAagTacGacAgamCtgAagGagmCagTtgAaaAagTt 

TtcTgtmCatAcaAtcGtgmCtaAtcGgcAggTtgmCgaTccTttGtaAccAt 

AagmCttmCggAcamCcaTtgAgaAtgTcaAagmCcaAaaTccAggAtaAggAg 

AatrnCgaAccmCatmCaaTtcActmCgtTatTccTccTcgAtcTccGttmCaaGt 

mCtgAacmCatmCcaAatAttGaaGatmCcaGctmCagGctGaaGccTatmCagAt 

mCgtGtgmCttAtcTctTctGgaTgaAaamCaaGgaTtgGaaGccGtcAatmCt 

mCggAagmCatmCtgmCctTgamCatTctmCcgTtcGcaGtgGtcGccGgcTctG 

AaaGtamCgcTatGlgAggAggmCtaAcamCcaTtcAtaTaaGaamCgcAgcmCa 

TgtTgc m Cgt AgaAgaGag ActAaa AclAag AacGatTgaTtgAagGtcTg 

TacAatTctTtgmCagGaaGcaAtaTccGccGgaGtcmCccmCttAtcActAt 

mCtcAcgGagGttAtaAttmCtaTgcAggAggrnCaaTttmCtgmCtgGagTtcmCa 

AccGttTcaTgaGagmCtgTaaTcaGgtGttGttTctGtaAaaAgtGtgAa 

GtgGatGtgAaaTtaGtcmCtcAacmCccAgaGcaTttAgtGcaGagAttAg 

GcaGttTaaTgtGaaGctAgtTaaAgtAcaGtcTacGtgGgamCgaGaaAt 

AttGccAagTccAttTctmCgtGccAagTacAttmCaaAatAcaAgaAagGc 

AgamCtcmCtamCaaAtaGatTcgGtgTccTgcmCagAcgAtgTtgAagAatAg 

TtgAagTttGggAatAttGgtAtgGttGaaGacmCaaGgamCcgGatTacGa 

GagGcgmCaaGtaGgcAatGatTcaAgaAgtAgtAaaGgcAatmCgtAacAc 

TgaGcamCaaAgtTaaGatGttmCggAaaGaaAaaGaaAgtmCaaTccTatGa 

mCaaGtgAccAatrriCagniCacGcamCggmCttmCcaTccTcaAgamCtgAtaTta 

mCc 

AttAaaTgcGcaGatGagGacGgaAcgAatAtcGgaGaaActGatAatAt 

GatGgtAagmCtgAgcGccTtgGacGaaGaaTttGatGttGtcGctActAa 

TacGtcAcgmCaaGgamCagAgcTtlGacGacGaaAtaTcamCttGgaGgaTt 

TctmCccTgtGtaGgtAcamCcaAtaTcamCaaGcgmCatTtcTatGtcGacTa 

TgcTaamCacmCagTltAgamCcaTggAaaTccmCacmCgcAaaTatAagmCaa 

Tg 

GcaGgamCatAagAttmCcgGtcAagmCaamCgamCagTgaAgaAagTatGcaA 
a 

mCcgTctAgtGaaAgcGggAtgGctAaaTtgGgaAaamCgamCaaGatGttAt 
GatGctTcaAtaTccTrtGatGgtmCgtTagTttAccAttTttGgtGtcH 
AgtmCatTtgAgtTatGlgAagAccGttGgtGggAaaGaaGagAtcAggTg 
GtcTtgGctAccAcaniCccAaaAccGttmCgaAacTttAagAgcAttmCtamCt 
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Example 54. Evaluation of different LNA substituted oligonucleotides as probes for 
fluorescence in situ hybridization (FISH) on metaphase chromosomes and interphase nuclei, 
5 Locked Nucleic Acids (LNA) constitute a novel class of DNA analogues that have an 
exceptionally high affinity towards complementary DNA and RNA. Using human classical 
satellite-2 repeat sequence clusters as targets, we demonstrate that LNA/DN A mixmer 
oligonucleotides are excellent probes for FISH combining high binding affinity with short 
hybridization time and even with the ability to hybridize without prior termal denaturation of 

10 the template. The development of molecular probes and image analysis has made fluorescence 
in situ hybridization (FISH) a powerful investigative tool. Although FISH has proved to be a 
useful technique in many areas, it is a fairly time-consuming procedure with limitations in 
sensitivity. Probes with higher DNA affinity may potentially reduce the time needed for 
hybridization and the sensitivity of the technique. Thus, improvement in hybridization 

15 characteristics has been reported for the DNA mimic peptide nucleic acid (PNA). This example 
describes the development of LNA substituted oligonucleotides as probes for fluorescence in 
situ hybridization on metaphase chromosomes and interphase nuclei. In each experiment a 
different LNA substituted oligonucleotide of the same 23-bp human satellite-2 repeat sequence 
(attccattcgattccattcgatc) have been used, cf. Jeanpierre, M. (1994). Human satellites 2 and 3. 

20 Annals of Genetics 37, 163-171. Oligomers with various LNA content, different labels, and 
hybridization conditions have been used and compared with each other and the optimal 
conditions have been determined for an efficient LNA-FISH protocol. 
A. MATERIALS AND METHODS 
Al. Chromosome Preparations 

25 Chromosome preparations were made by standard methods from peripheral lymphocyte 

cultures of normal males. Slides were prepared 1-4 days prior to an experiment and treated with 
RNAse (lOjig/^tl) at 37°C for one hour before hybridization. 
A2. Probe preparation 

The 23bp human satellite-2 repeat sequence, attccattcgattccattcgatc, was used to prepare the 
30 LNA/DNA mixmers with different content and sequence order of LNA modifications (Table 
19). All mixmers were labeled in the 5' end with either Cy3 or biotin. Biotin amidite was 
purchased from Applied Biosystems and Cy3 amidite was purchased from Amersham 
Bioscience. A DNA oligonucleotide of the same sequence without any LNA modifications was 
used as a control in each experiment. 
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A3. Fluorescence in situ hybridization 

FISH was carried out as described by Silahtaroglu AN, Hacihanefioglu S, Guven GS, Cenani A, 
Wirth J, Tommerup N, Turner Z.(1998) Not para-, not peri-, but centric inversion of 
chromosome 12. J Med Genet. 35(8):682-4. (1); with the following modifications including: 
5 The amounts of probe were 6.4, 10, 13.4 and 20 pmoles. Denaturation of the target DNA and 
the probe were performed at 75oC for 5 minutes either separately using 70% formamide or 
simultaneously under the coverslip in the presence of the hybridization mixture containing 50% 
formamide. In addition the effect of denaturation was also tested. Two alternative hybridization 
mixtures were used: 50% formamide/2xSSC (pH 7.0) /I0% dextran sulphate or 2xSSC (pH 7.0) 

10 /10% dextran sulphate. Hybridization times included 30 min, 1 hr, 2 hrs, 3 hrs and overnight. 
Hybridization temperatures included: 37°C, 55°C, 60°C and 72°C. Post washing was either as 
for standard FISH (1), or with 50% formamide/2xSSC at 60°C, or without formamide. 
Hybridization signals with biotin labeled LNA substituted oligonucleotide probes were 
visualized indirectly using two layers of fluorescein-labeled avidin (Vector Labs) linked by a 

15 biotinylated anti-avidin molecule, which amplified the signal 8-64 times. The hybridization of 
Cy3 labeled molecules however, was visualized directly after a short washing procedure. 
Slides were mounted in Vectashield (Vector Laboratories) containing 4'-6'-diamidino-2- 
phenylindole (DAPI). The whole procedure was carried out in the dark. The signals were 
visualized using a Leica DMRB epifluorescence microscope equipped with a SenSys charge - 

20 coupled device camera (Photometries, Tucson, AZ), and IPLAB Spectrum Quips FISH 
software (Applied Imaging international Ltd, Newcastle, UK) within two days after 
hybridization. 

B. RESULTS AND DISCUSSION 

Satellite-II DNA, composed of multiple repeats of a 23bp and a 26bp sequence, is especially 
25 concentrated in the large heterochromatic regions of human chromosomes 1 and 16, but is also 
found in the heterochromatic regions of chromosomes 9, Y, 15 and in other minor sites like the 
short arms/satellites of the acrocentric chromosomes and some centromeric regions. Classical 
satellite DNA can be visualised by FISH with traditional genomic and DNA oligonucleotide 
probes (see Kokalj-Vokac N, Alemeida A, Gerbault-Seureau M, Malfoy B, Dutrillaux B. 
30 (1993) Two-color FISH characterization of i(lq) and der(l ;16) in human breast cancer cells. 
Genes Chromosomes Cancer. 7, 8-14; and Tagarro I, Fernandez-Peralta AM, Gonzales- 
Aguilera JJ. (1994) Chromosomal localization of human satellites 2 and 3 by a FISH method 
using oligonucleotides as probes. Hum Genet. 93(4):383-8). Due to this and the presence of 
distinct major and minor sites of sateIlite-2 DNA in the genome, we used the 23-bp satellite-2 
35 repeat sequence, attccattcgattccattcgatc, as a convenient model to test the efficiency of various 
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DNA/LNA mixmers for FISH analysis and the effect of different experimental conditions by 
recording the number, location and strength of signals on each metaphase. To compare the 
efficiency of mixmers with different LNA content (Table 19) and to optimize the LNA-FISH 
protocol, different conditions were tried at each step of a standard FISH protocol as described in 
5 Materials and Methods. All LNA substituted oligonucleotides (LNA/DNA mixmer 
oligonucleotides) for human satellite-2 sequence gave very prominent signals when used as 
FISH probes. In general, the signal on chromosome 1 was always stronger and appeared earlier, 
followed by signals on chromosomes 16, 9, Y, 15, other acrocentric chromosomes and the 
centromeric regions of other chromosomes, respectively (Figure 51). In general, biotin labeled 
10 mixmers gave stronger signals with a higher background, whereas Cy3-labeled molecules gave 
a significantly lower background. 

Bl. Effect of LNA content of the LNA substituted oligonucleotides (LNA/DNA mixmers) 
The LNA-2 molecule which had every other nucleotide modified as LNA. 
(aTtCcAtTcGaTtCcAtTcGaTc) gave the best results in all the experiment performed. The 

15 LNA-3 molecule, with every third oligonucleotide modified as LNA, 

(aTtcCatTcgAtTccAttCgaTc) also gave hybridization signals, but with less efficiency than the 
LNA-2 probes. Preferably, an LNA-2 oligonucleotide molecule has an LNA unit at every other 
nucleotide position in the sequence and an LNA-3 oligonuclotide molecule has an LNA unit at 
every third position of the sequence. However, minor deviations, e.g. in one position or less 

20 than 5-10 percent of the nucleotide positions in the sequence may still provide the general 
features of an LNA-2 or an LNA-3 molecule. 

The Dispersed LNA (aTtccatTcgaTtccAttcgaTc), which had 5 dispersed LNA modifications, 
was less efficient in short term hybridization, but gave signals on both chromosomes 1 and 16 
after overnight hybridizations. LNA/DNA mixmers with 3 LNA Blocks 
25 (aTTCcattcgATTccattcGATc) was comparably inferior as a FISH-probe. 
B2. Effect of amount of the LNA/DNA mixmers 

The initial experiments performed with 20 pmol of LNA/DNA mixmer resulted in bright and 
large signals, but with an extremely high background. Thus, lower concentrations were tested 
(13.4 pmol, 10 pmol and 6.4 pmol). The concentration giving the optimal signal to noise ratio 
30 was found to be 6.4 pmol. 
B3. Effect of denaturation 

The signals on the major sites of hybridization (lq, 16q) were equally bright after both types of 
denaturation. However, smaller and weaker signals were observed on the minor sites with the 
simultaneous denaturation protocol. To check the potential "strand invasion" property of LNA, 
35 some of the experiments were performed without a denaturation step. As expected, no signals 
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were obtained by the control DNA oligonucleotide probe. In contrast, hybridization signals on 
chromosomes 1 and 16 were observed after overnight hybridization with LNA probes, with 
LNA-2 mixmer giving the best signals. Compared to the signals obtained in experiments 
involving a denaturation step, the signals were smaller, but prominent and without any 
5 background. 

B4. Effect of hybridization time, temperature and post-hybridization washes 
Although signals could be observed after only 30 min of hybridization, the optimal 
hybridization time and temperature for LNA-2, which gave the best signals, was 1 hr at 37°C. 
A 3x5 min wash with 0.1xSSC/60°C and 4xSSC/0.05%Tween/37°C, respectively, followed by 
10 a 5 min PBS wash was found to be sufficient for washing the slides after hybridization with 
DNA-LNA mixmers. There was no specific difference between a wash with 50% formamide at 
42°C or 60°C. 

The signals faded away in most of the slides within two days. When hybridized with directly 
15 labeled LNA, the whole slide was stained with Cy3 after three days. Thus, slides had to be 
analyzed within 48 hours after hybridization. 
C. CONCLUSION 

The experiments have demonstrated, that LNA substituted oligonucleotides are very efficient 
FISH probes. LNA substituted oligonucleotide probes gave strong signals after only 1 hr of 

20 hybridization, and it was possible to omit the use of formamide both from the denaturation and 
from the post hybridization washing steps and still obtain a very good signal to noise ratio. The 
ability of LNA to hybridize without prior denaturation could be due to a strand invasion 
property of LNA and this warrants further investigation with other LNA probes and at different 
conditions. Based on the combined results of these experiments, the optimal LNA-FISH 

25 procedure was defined as follows: 6.4 pmoles of Cy-3 labeled LNA-2 probe was denatured 
together with the target at 75°C for 5 minutes, and hybridized for one hour then followed by a 
short post wash without any formamide (3x 5 minutes O.lxSSC at 60°C; 2x 5minutes 
4xSSC/0.05%Tween at 37°C; 5 minute PBS). The FISH experiments indicate that LNA 
containing probes would be valuable for the detection of a variety of other repetitive elements, 

30 such as centromeric a-repeats and telomeric repeats. In addition, the superior hybridization 
characteristics of LNA containing oligonucleotides could lead to detection of single base pair 
differences between repetitive sequences as well as single copy sequences. 
CI. Figure 5 1 shows a comparison of different LNA/DNA mixmer oligonucleotides. 
Experiment conditions: 6.4 pmoles of Cy3 labeled probe was hybridized for 30 minutes at 

35 37°C, after simultaneous denaturation of the target and the probe at 75°C for 5 minutes. A. 
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LNA-2 giving signals on chromosomes 1, 16, 9 and 15, B. LNA-3 giving bright signals on 
chromosomes 1,16 and 9, C. Dispersed LNA giving signals on chromosomes 1 and 16 only, D. 
LNA Block giving smaller signals on chromosome 1, E. DNA oligo giving no signals on any of 
the chromosomes. 


Table 19. DNA/LNA mixmers for human satellite 2 repeat sequence used in this study. 
FISH probe sequences 
attccattcgattccattcgatc 
aTtccatTcgaTtccAttcgaTc 
aTtcCatTcgAtTccAttCgaTc 
aTTCcattcgATTccattcGATc 
ATtCcAtTcGaTtCcAtTcGaTc 


Name 
DNA oligo 
Dispersed LNA 
LNA-3 
LNA Blocks 
LNA-2 


LNA monomers Tm* 

0 60 

5 71 

8 77 

9 73 
11 84 


LNA modifications are depicted in capital letters and *Tm values for each molecule have been 
calculated using Exiqon's Tm Prediction program accessible at http://lna-tm.com/ and as 
10 appears from Figure 27 herein. 

Other Embodiments 

From the foregoing description, it will be apparent that variations and modifications 
may be made to the invention described herein to adopt it to various usages and conditions. 
The foregoing description of the invention is merely illustrative thereof, and it understood that 
15 variations and modifications can be effected without departing from the scope or spirit of the 
invention. 

All publications, patent applications, and patents mentioned in this specification are 
herein incorporated by reference to the same extent as if each independent publication, patent, 
or patent application was specifically and individually indicated to be incorporated by 
20 reference. 
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Claims 

1. A nucleic acid that is a non-naturally-occurring nucleic acid with a melting 

5 temperature that is at least 3°C higher than that of the corresponding control nucleic acid 
with 2'-deoxynucleotides; wherein said nucleic acid hybridizes to a first region within a 
first exon of a target nucleic acid and to a second region within a second exon of said 
target nucleic acid that is adjacent to said first exon. 

2. A nucleic acid that is a non-naturally-occurring nucleic acid with a melting 
10 temperature that is at least 3°C higher than that of the corresponding control nucleic acid 

with 2-deoxynucleotides; wherein said nucleic acid hybridizes to a first region within 
an exon of a target nucleic acid and to a second region within an intron of said target 
nucleic acid that is adjacent to said exon. 

3. A nucleic acid that is a non-naturally-occurring nucleic acid with a melting 

15 temperature that is at least 3°C higher than that of the corresponding control nucleic acid 
with 2'-deoxynucleotides; wherein said nucleic acid hybridizes to a first region within a 
first intron of a target nucleic acid and to a second region within a second intron of said 
target nucleic acid that is adjacent to said first intron. 

4. A nucleic acid that is a non-naturally-occurring nucleic acid with a capture 
20 efficiency that is at least 10% greater than that of a corresponding control nucleic acid 

with 2'-deoxynucleotides at the temperature equal to the melting temperature of said 
nucleic acid; wherein said nucleic acid hybridizes to a first region within a first exon of 
a target nucleic acid and to a second region within a second exon of said target nucleic 
acid that is adjacent to said first exon. 

25 5. A nucleic acid that is a non-naturally-occurring nucleic acid with a capture 

efficiency that is at least 10% greater than that of a corresponding control nucleic acid 
with 2-deoxynucleotides at the temperature equal to the melting temperature of said 
nucleic acid; wherein said nucleic acid hybridizes to a first region within an exon of a 
target nucleic acid and to a second region within an intron of said target nucleic acid that 

30 is adjacent to said exon. 

6. A nucleic acid that is a non-naturally-occurring nucleic acid with a capture 
efficiency that is at least 10% greater than that of a corresponding control nucleic acid 
with 2'-deoxynucleotides at the temperature equal to the melting temperature of said 
35 nucleic acid; wherein said nucleic acid hybridizes to a first region within a first intron of 
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a target nucleic acid and to a second region within a second intron of said target nucleic 
acid that is adjacent to said first intron. 

7. A nucleic acid that is an LNA and that hybridizes to a first region within a 
first exon of a target nucleic acid and to a second region within a second exon of said 

S target nucleic acid that is adjacent to said first exon. 

8. A nucleic acid that is an LNA and that hybridizes to a first region within an 
exon of a target nucleic acid and to a second region within an intron of said target 
nucleic acid that is adjacent to said exon. 

9. A nucleic acid that is an LNA and that hybridizes to a first region within a 
10 first intron of a target nucleic acid and to a second region within a second intron of said 

target nucleic acid that is adjacent to said first intron. 

10. The nucleic acid of any one of claims 1-9, wherein the length of said first 
region and the length of said second region are between 3 and 50 nucleotides, inclusive. 

11. The nucleic acid of claim 10, wherein the length of said first region and the 
15 length of said second region are between 10 and 40 nucleotides, inclusive. 

12. The nucleic acid of claim 11, wherein the length of said first region and the 
length of said second region are between 20 and 30 nucleotides, inclusive. 

13. The nucleic acid of any one of claims 1-9, wherein said first region and said 
second region are the same length. 

20 14. The nucleic acid of any one of claims 1-9, wherein said first region and said 

second region are a different length. 

15. A population of nucleic acids that includes a nucleic acid of any one of 
claims 1-14. 

16. A population of nucleic acids that includes a non-naturally-occurring nucleic acid 
25 with a melting temperature that is at least 3°C higher than that of the corresponding control 

nucleic acid with 2-deoxy nucleotides; wherein said nucleic acid hybridizes to only one exon or 
to only one intron of a target nucleic acid. 

17. A population of nucleic acids that includes a non-naturally-occurring nucleic acid 
with a capture efficiency that is at least 10% greater than that of a corresponding control nucleic 

30 acid with 2'-deoxynucleotides at the temperature equal to the melting temperature of said 
nucleic acid; wherein said nucleic acid hybridizes to only one exon or to only one intron of a 
target nucleic acid. 

18. A population of nucleic acids that includes a nucleic acid that is an LNA and 
that hybridizes to only one exon or to only one intron of a target nucleic acid. 
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19. A nucleic acid of any one of claims 1-9 that is a non-naturally occurring nucleic 
acid with a melting temperature that is at least 3°C higher than that of the 
corresponding control nucleic acid with 2-deoxynucleotides; wherein said 
nucleic acid hybridizes to only one exon or to only one intron of a target nucleic 

5 acid. 

20. A nucleic acid of any one of claims 1-9 that is a non-naturally-occurring nucleic 
acid with a capture efficiency that is at least 10% greater than that of a 
corresponding control nucleic acid with 2-deoxynucJeotides at the temperature 
equal to the melting temperature of said nucleic acid; wherein said nucleic acid 

10 hybridizes to only one exon or to only one intron of a target nucleic acid. 

2 1 . A nucleic acid of any one of claims 1 -9 that comprises an LNA and that 
hybridizes to only one exon or to only one intron of a target nucleic acid. 

22. The population of any one of claims 15-18, wherein said nucleic acid is between 

15 and 150 nucleotides in length, inclusive. 
15 23. The population of claim 22, wherein said nucleic acid is between 5 and 100 
nucleotides in length, inclusive. 

24. The population of claim 23, wherein said nucleic acid is between 20 and 80 
nucleotides in length, inclusive. 

25. The population of claim 24, wherein said nucleic acid is between 30 and 60 
20 nucleotides in length, inclusive. 

26. The population of claim 25, wherein said nucleic acid is 40 nucleotides in 
length. 

27. The population of claim 25, wherein said nucleic acid is 50 nucleotides in 
length. 

25 28. The nucleic acid of any one of claims 1-9 and 19-21 wherein said nucleic acid is 
between 8 and 70 nucleotides in length. 

29. The nucleic acid of any one of claims 1-9 and 19-21 wherein said nucleic acid is 
between 9 and 50 nucleotides in length. 

30. The nucleic acid of any one of claims 1-9 and 19-21 wherein said nucleic acid is 
30 between 12 and 40 nucleotides in length. 

31. The nucleic acid of any one of claims 1-9 and 19-21 wherein said nucleic acid is 
between 15 and 35 nucleotides in length. 

32. The population of any one of claims 15-18, wherein at least 5% of the 
nucleotides in said nucleic acid are LNA units. 
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33. The population of claim 32, wherein at least 10% of the nucleotides in said 
nucleic acid are LNA units. 

34. The population of claim 32, wherein at least 20% of the nucleotides in said 
nucleic acid are LNA units. 

5 35. The population of any one of claims 15-18, wherein every second nucleotide in 
said nucleic acid is an LNA unit. 

36. The population of any one of claims 15-18, wherein every third nucleotide in 
said nucleic acid is an LNA unit. 

37. The population of any one of claims 15-18, wherein every fourth nucleotide in 
10 said nucleic acid is an LNA unit. 

38. The population of any one of claims 15-18, wherein every Fifth nucleotide in 
said nucleic acid is an LNA unit. 

39. The population of any one of claims 15-18, wherein every sixth nucleotide in 

said nucleic acid is an LNA unit. 

15 40. The population of any one of claims 15-18, wherein (i) every second and every 
third nucleotide, (ii) every second and every fourth nucleotide, (iii) every 
second and every fifth nucleotide, (iv) every second and every sixth nucleotide, 
(v) every third and every fourth nucleotide, (vi) every third and every fifth 
nucleotide, (vii) every third and every sixth nucleotide, (viii) every fourth and 

20 every fifth nucleotide, (ix) every fourth and every sixth nucleotide, and/or (x) 

every fifth and every sixth nucleotide in said nucleic acid is an LNA unit. 

41. The population of claim 40, wherein every second, every third, and every fourth 
nucleotide in said nucleic acid is an LNA unit. 

42. The population of any one of claims 15-18, wherein said nucleic acid comprises 
25 two or more contiguous LNA units. 

43. The population of claim 42, wherein said nucleic acid comprises at least 4 
contiguous LNA units. 

44. The population of claim 42, wherein said nucleic acid comprises at least 5 
contiguous LNA units. 

30 45. The population of claim 42, wherein the number of contiguous LNA units is 
between 5 and 20% of the total length of said nucleic acid. 

46. The population of claim 45, wherein the number of contiguous LNA units is 
between 10 and 15% of the total length of said nucleic acid. 

47. The population of claim 42, wherein at least one LNA unit in said nucleic acid 
35 hybridizes to a first region within a first exon of a target nucleic acid and at least 
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one LNA unit in said nucleic acid hybridizes to a second region within a second 
exon of said target nucleic acid that is adjacent to said first exon. 

48. The population of claim 47, wherein at least two LNA units in said nucleic acid 
hybridize to a first region within a first exon of a target nucleic acid and at least 

5 two LNA units in said nucleic acid hybridize to a second region within a second 

exon of said target nucleic acid that is adjacent to said first exon. 

49. The population of claim 48, wherein at least three LNA units in said nucleic 
acid hybridize to a first region within a first exon of a target nucleic acid and at 
least three LNA units in said nucleic acid hybridize to a second region within a 

10 second exon of said target nucleic acid that is adjacent to said first exon. 

50. The population of any one of claims 15-18, wherein the 5* terminal nucleotide of 
said nucleic acid is not an LNA unit. 

51. The population of any one of claims 15-18, wherein the 5* terminal nucleotide of 
said nucleic acid is an LNA unit. 

15 52. The population of any one of claims 15-18, wherein the 3' terminal nucleotide of 

said nucleic acid is not an LNA unit. 
53. The population of any one of claims 15-18, wherein said nucleic acid can 

distinguish between different nucleic acids that cannot be distinguished using a 

naturally-occurring control nucleic acid. 
20 54. The population of claim 53, wherein said control nucleic acid consists of only 

2'-deoxynucleotides. 

55. The population of claim 53, wherein said different nucleic acids are mRNA 
splice variants. 

56. The population of any one of claims 15-18, wherein said nucleic acid comprises 
25 one or more universal bases. 

57. The population of claim 56, wherein said universal base is located at the 5' or 3' 
terminus of said nucleic acid. 

58. The population of claim 56, wherein one or more universal base are located at 
the 5' and 3' termini of said nucleic acid. 

30 59. The population of claim 56, wherein all of said nucleic acids of said population 
have the same number of universal bases. 

60. The population of claim 56, wherein said universal base is inosine, pyrene, 3- 
nitropyrrole, or 5-nitroindole. 

61. The population of any one of claims 15-18, wherein said nucleic acid comprises 
35 at least one LNA A or LNA T. 
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62. The population of any one of claims 15-18, wherein each nucleic acid in said 

population comprises at least one LNA A or LNA T. 

63. The population of claim 61, wherein all of the adenine and thymine-containing 

nucleotides in said LNA are LNA A and LNA T, respectively. 
5 64. The population of any one of claims 15-18, wherein said nucleic acid comprises 
a at least one 2,6,-diaminopurine or 2-thio-thymine base. 

65. The population of any one of claims 15-18, wherein at least 5% of the nucleic 
acids in said population are LNA. 

66. The population of any one of claims 15-18, wherein at least 10% of the nucleic 
10 acid in said population are LNA. 

67. The population of any one of claims 15-18, that includes nucleic acids that 
together hybridize to at least 10% of the nucleic acids expressed by a particular 
cell or tissue. 

68. The population of claim 67, that includes nucleic acids that together hybridize to 
15 at least 50% of the exons of a target nucleic acid. 

69. The population of any one of claims 15-18, wherein said nucleic acid does not 
form a hairpin that would otherwise inhibit its binding to a target nucleic acid. 

70. The population of any one of claims 15-18, wherein opposing nucleotides in a 
palindrome pair or opposing nucleotides in inverted repeats of said nucleic acid 

20 are not both LNA units. 

71. The population of any one of claims 15-18, wherein said nucleic acid forms less 
than 3 intramolecular base-pairs. 

72. The population of any one of claims 15-18, wherein said nucleic acid does not 
have LNA-5-nitroindole: LNA-5-nitroindole intramolecular base-pairs. 

25 73. The population of any one of claims 15-18, wherein said nucleic acid has a LNA 

unit with a 2,6,-diaminopurine, 2-aminopurine, 2-thio-thymine, 2-thio-uracil, 

inosine, or hypoxanthine base. 
74. The population of any one of claims 15-18, wherein said nucleic acid has a 2'0, 

4'C-methylene linkage. 
30 75. The population of any one of claims 15-18, wherein one or more nucleic acids 

are LNA/DNA, LN A/RNA, or LNA/DNA/RNA chimeras. 
76. The population of any one of claims 15-18, wherein said nucleic acid has a 

melting temperature that is at least 10°C higher than that of the corresponding 

control nucleic acid with 2'-deoxynucleotides. 
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77. The population of claim 76, wherein said nucleic acid has a melting temperature 
that is at least 20°C higher than that of the corresponding control nucleic acid 
with 2-deoxynucleotides. 

78. The population of any one of claims 15-18, wherein said nucleic acid has a 

5 capture efficiency that is at least 100% greater than that of the corresponding 

control nucleic acid with 2-deoxynucleotides at the temperature equal to the 
melting temperature of said nucleic acid. 

79. The population of claim 78, wherein said nucleic acid has a capture efficiency 
that is at least 400% greater than that of the corresponding control nucleic acid 

10 with 2-deoxynucleotides at the temperature equal to the melting temperature of 

said nucleic acid. 

80. The population of any one of claims 15-18, wherein said nucleic acids are 
covalently bonded to a solid support. 

81. The population of claim 80, wherein said nucleic acids are in a predefined 
15 arrangement. 

82. The population of any one of claims 15-18, comprising at least 10 different 
nucleic acids. 

83. The population of claims 82, comprising at least 100 different nucleic acids. 

84. The population of claim 83, comprising at least 500 different nucleic acids. 
20 85. The population of claim 84, comprising at least 1,000 different nucleic acids. 

86. The population of claim 85, comprising at least 5,000 different nucleic acids. 

87. A complex of one or more target nucleic acids and one or more nucleic acids or 
populations of any one of claims 1-9, 15-18 or 19-21 in which one or more 
target nucleic acids are hybridized to one or more said nucleic acids or 

25 populations. 

88. The complex of claim 87, wherein at least 10 different target nucleic acids are 
hybridized. 

89. The complex of claim 87, wherein said target nucleic acids are cDNA molecules 
reverse transcribed from a patient sample. 

30 90. The complex of claim 87, wherein said target nucleic acids are cDNA molecules 
reverse transcribed from a patient sample and fragmented using E. coli Uracil- 
DNA Glycosylase. 

91 . The complex of claim 87, wherein said target nucleic acids are cDNA molecules 
reverse transcribed from a patient sample and fragmented using E, coli Uracil- 
35 DNA Glycosylase to an average size of 300 nucleotides. 
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92. The complex of claim 87, wherein the target nucleic acids are cDNA molecules 
reverse transcribed from a patient sample and fragmented using E. coli Uracil- 
DNA Glycosylase to an average size of 200 nucleotides. 

93. The complex of claim 87, wherein the target nucleic acids are cDNA molecules 
5 reverse transcribed from a patient sample and fragmented using E. coli Uracil- 

DNA Glycosylase to an average size of 100 nucleotides. 

94. The complex of claim 87, wherein the target nucleic acids are cDNA molecules 
reverse transcribed from a patient sample and fragmented using E. coli Uracil- 
DNA Glycosylase to an average size of 50 nucleotides. 

10 95. The complex of claim 87, wherein said target nucleic acids are cRNA molecules 
amplified from a patient sample. 

96. The complex of claim 87, wherein said target nucleic acids are cRNA molecules 
amplified from a patient sample and fragmented using alkaline hydrolysis. 

97. The complex of claim 87, wherein at least 10 different target nucleic acids are 
15 labeled and hybridized. 

98. The complex of claim 87, wherein said target nucleic acids are RNA molecules 

isolated from a patient sample and labeled chemically directly, e.g. using 
platinum-linked fluorescent dyes. 

99. The complex of claim 87, wherein said target nucleic acids are RNA molecules 
20 isolated from a patient sample and labeled chemically directly, e.g. using 

platinum-linked fluorescent dyes and fragmented using alkaline hydrolysis. 

100. A method for detecting the presence of one or more target nucleic acids in a 
sample, said method comprising incubating a labeled nucleic acid sample with 
one or more first nucleic acids of any one of claims 1-9 or 19-21 or one or more 

25 populations of first nucleic acids of any one of claims 15-18 under conditions 

that allow at least one target nucleic acid to hybridize to at least one of said first 
nucleic acids. 

101. The method of claim 100, wherein hybridization is detected between at least 5 
target nucleic acids and said first nucleic acids. 

30 102.The method of claim 100, further comprising identifying one or more 
hybridized target nucleic acids. 

103. The method of claim 100, further comprising determining the amount of one or 
more hybridized target nucleic acids. 

104. The method of claim 100, wherein one or more target nucleic acids are labeled 
35 with a fluorescent group, and wherein the determination of the amount of said 
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hybridized target nucleic acid involves one or more of the following: (i) 
adjusting for the varying intensity of an excitation light source used for 
detection of said hybridization, (ii) adjusting for photobleaching of said 
fluorescent group, and/or (iii) comparing the fluorescent intensity of said 
5 hybridized target nucleic acid(s) to the fluorescent intensity of a different 

sample of hybridized nucleic acids. 
105 The method of claim 100, wherein said target nucleic acids are cDNA 
molecules reverse transcribed from a patient sample. 

106. The method of claim 105, wherein said target nucleic acids are cDNA 

10 molecules reverse transcribed from a patient sample and fragmented using E. 

coli Uracil-DNA Glycosylase. 

107. The method of claim 105, wherein said target nucleic acids are cDNA 
molecules reverse transcribed from a patient sample and fragmented using £. 
coli Uracil-DNA Glycosylase to an average size of 300 nucleotides. 

15 108.The method of claim 105, wherein said target nucleic acids are cDNA 

molecules reverse transcribed from a patient sample and fragmented using E. 
coli Uracil-DNA Glycosylase to an average size of 200 nucleotides. 
l09.The method of claim 105, wherein said target nucleic acids are cDNA 

molecules reverse transcribed from a patient sample and fragmented using E. 

20 coli Uracil-DNA Glycosylase to an average size of 100 nucleotides, 

I lOThe method of claim 105, wherein said target nucleic acids are cDNA 

molecules reverse transcribed from a patient sample and fragmented using E. 
coli Uracil-DNA Glycosylase to a n average size of 50 nucleotides. 

I I l.The method of claim 105, wherein said target nucleic acids are cRNA molecules 
25 amplified from a patient sample that is optionally fragmented uing alkaline 

hydrolysis. 

1 12. The method of claim 100, further comprising determining the presence or 
absence of an mRNA splice variant of interest in said sample or determining the 
presence or absence of a mutation, deletion, and/or duplication of an exon of 

30 interest. 

1 13. The method of claim 100, wherein the sample has nucleic acids that are 
amplified using one or more primers specific for an exon of a target nucleic 
acid, and wherein said method involves determining the presence or absence of 
an mRNA splice variant with said exon in said sample. 
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1 14.The method of claim 1 13, wherein one or more of said primers are specific for 
an exon or exon-exon junction of a pathogen of interest, and said method 
involves determining the presence or absence of a nucleic acid with said exon in 
said sample. 

5 1 15.The method of claim 100, wherein said first nucleic acids are covalently bonded 
to a solid support by reaction of a nucleoside phosphoramidite with an activated 
solid support, and subsequent reaction of a nucleoside phosphoramide with an 
activated nucleotide or nucleic acid bound to said solid support. 

1 16. The method of claim 100, further comprising contacting said target nucleic acid 
10 with a second nucleic acid or a population of second nucleic acids that binds to 

a different region of the target molecule than said first nucleic acid. 

1 17. A method for amplifying a target nucleic acid, said method comprising the steps 
of: (a) incubating one or more first nucleic acids of any one of claims 1-9 or 19- 
21 or one or more populations of first nucleic acids of any one of claims 15-18 

15 with a target nucleic acid under conditions that allow said first nucleic acid to 

bind said target nucleic acid; and (b) extending said first nucleic acid with said 
target nucleic acid as a template. 

1 18. The method of claim 1 17, further comprising contacting said target nucleic acid 
with a second nucleic acid or a population of second nucleic acids that binds to 

20 a different region of the target molecule than said first nucleic acid. 

1 19. A method for classifying a test nucleic acid sample comprising a target nucleic 
acid, said method comprising the steps of: (a) incubating a test nucleic acid 
sample with one or more nucleic acids probes of any one of claims 1-9 or 19-21 
or one or more populations of nucleic acids probes of any one of claims 15-18 

25 under conditions that allow at least one of the nucleic acids in said test sample 

to hybridize to at least one nucleic acid probe; (b) detecting a hybridization 
pattern of said test nucleic acid sample; and (c) comparing said hybridization 
pattern to a hybridization pattern of a first nucleic acid standard, whereby said 
comparison indicates whether or not said test sample has the same classification 

30 as said first standard. 

120. The method of claim 1 19, further comprising comparing a hybridization pattern 
of said test nucleic acid sample to a hybridization pattern of a second standard. 

121. The method of claim 1 19, wherein at least 5 target nucleic acids hybridize to 
said nucleic acid probes. 
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122. The method of claim 1 19, further comprising identifying a hybridized target 
nucleic acid. 

123. The method of claim 1 19, further comprising determining the amount of said 
hybridized target nucleic acid. 

5 124.The method of claim 1 19, wherein said target nucleic acids are labeled with 
fluorescent groups. 

125The method of claim 124, wherein said determination comprises scaling for the 
varying labeling efficiency for the different fluorescent groups used for 
detection of said hybridization. 
10 126.The method of claim 124, wherein said determination comprises adjusting for 
the varying intensity of the excitation light source used for detection of said 
hybridization. 

127.The method of claim 124, wherein said determination comprises adjusting for 
photobleaching of said fluorescent group. 
15 128.The method of claim 124, wherein said comparison comprises adjusting for a 
difference in the amount of said nucleic acid probes used for hybridization to 
said test sample and said first standard. 

129. The method of claim 124, wherein said comparison comprises adjusting for a 
difference in the buffer used for hybridization to said test sample and said first 

20 standard. 

130. The method of claim 129, wherein said difference is a difference in Mg 2+ 
concentration. 

131. The method of claim 124, wherein said first nucleic acid standard is labeled 
with a different fluorescent group. 

25 132.The method of claim 1 19, wherein said target nucleic acids are cDNA 

molecules reverse transcribed from a patient sample and optionally fragmented 
using E. coli Uracil-DNA Glycosylase. 

133. The method of claim 119, wherein said target nucleic acids are cDNA 
molecules reverse transcribed from a patient sample and fragmented using £. 

30 coli Uracil-DNA Glycosylase to an average size of 300 nucleotides. 

134. The method of claim 1 19, wherein said target nucleic acids are cDNA 
molecules reverse transcribed from a patient sample and fragmented using E. 
coli Uracil-DNA Glycosylase to an average size of 200 nucleotides. 
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135. The method of claim 1 19, wherein said target nucleic acids are cDNA 
molecules reverse transcribed from a patient sample and fragmented using E. 
coli Uracil-DNA Glycosylase to an average size of 100 nucleotides. 

136. The method of claim 119, wherein said target nucleic acids are cDNA 

5 molecules reverse transcribed from a patient sample and fragmented using E. 

coli Uracil-DNA Glycosylase to an average size of 50 nucleotides. 

137. The method of claim 1 19, wherein said target nucleic acids are cRNA 
molecules amplified from a patient sample. 

138. The method of claim 1 19, wherein said target nucleic acids are cRNA 

10 molecules amplified from a patient sample and fragmented using alkaline 

hydrolysis. 

139. The method of claim 1 19, further comprising determining the presence or 
absence of an mRNA splice variant of interest in said sample. 

140. The method of claim 1 19, further comprising determining the presence or 
15 absence of a mutation, deletion, and/or duplication of an exon of interest. 

141. The method of claim 140, wherein said mutation, deletion, and/or duplication is 
indicative of a disease, disorder, or condition. 

142. The method of claim 141 , wherein said disease is cancer. 

143. The method of claim 1 19, wherein the sample has nucleic acids that are 
20 amplified using one or more primers specific for an exon of a target nucleic 

acid, and wherein said method involves determining the presence or absence of 
an mRNA splice variant with said exon in said sample. 

144. The method of claim 143, wherein one or more of said primers are specific for 
an exon or exon-exon junction of a pathogen of interest, and said method 

25 involves determining the presence or absence of a nucleic acid with said exon in 

said sample. 

145. The method of claim 1 19, wherein said first nucleic acids are covalently 
bonded to a solid support by reaction of a nucleoside phosphoramidite with an 
activated solid support, and subsequent reaction of a nucleoside phosphoramide 

30 with an activated nucleotide or nucleic acid bound to said solid support. 

146. A method of selecting a nucleic acid for a population of nucleic acids, said 
method comprising the steps of: (a) determining the melting temperature of a 
nucleic acid, determining the ability of said nucleic acid to self-anneal, 
determining the ability of said nucleic acid to hybridize to one or more exons or 

35 introns of a target nucleic acid, and/or determining the ability of said nucleic 
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acid to hybridize to a non-target nucleic acid, and (b) selecting said nucleic 
acid for inclusion or exclusion from said population based on the determination 
in step (a), wherein said nucleic acid is a nucleic acid of any one of claims 1-9 
or 19-21 or a nucleic acid that has least one LNA unit and that hybridizes to 
5 only one exon or to only one intron of a target nucleic acid. 

147. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 for the detection, amplification, or 
classification of a nucleic acid of interest or a population of nucleic acids of 
interest. 

10 148. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 

nucleic acids of any one of claims 15-18 for alternative mRNA splice variant 
detection, expression profiling, comparative genomic hybridization, or real-time 
PCR. 

149. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
1 5 nucleic acids of any one of claims 1 5- 1 8 as a PCR primer or FISH probe. 

150. Use according to claim 149 wherein said nucleic acid or said population of 
nucleic acids comprise LNA in about every other or about every third position 
of the nucleotide sequence. 

151. Use according to claim 149 wherein said nucleic acid or said population of 
20 nucleic acids are used for the detection of a repetitive element. 

152. Use according to claim 151 wherein said repetitive element is a centromeric 
alpha-repeat or a telomeric repeat. 

153. Use according to claim 149 wherein said nucleic acid or said population of 
nucleic acids are used for the detection of a single base pair difference between 

25 repetitive sequences or for the detection of single copy sequences. 

154. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 as a probe in fluorescent in situ 
hybridisation of a target DNA without a denaturation step, 

155. Use of a nucleic acid of any one of claims 1-9 or 19-2 1 or a population of 
30 nucleic acids of any one of claims 1 5- 1 8 in a homogeneous assay such as 

quantitative real-time RT-PCR assay, e.g. Taqman real-time RT-PCR, 
Lightcycler real-time RT-PCR , or real-time N ASB A (nucleic acid sequence- 
based amplification) assay with a Molecular Beacon-based detection. 

156. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
35 nucleic acids of any one of claims 15-18 in a diagnostic kit based on a 
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homogeneous assay such as quantitative real-time RT-PCR assay, e.g. Taqman 
real-time RT-PCR, Lightcycler real-time RT-PCR , or real-time NASBA 
(nucleic acid sequence-based amplification) assay with a Molecular Beacon- 
based detection. 

5 157.Use of a nucleic acid of any one of claims 1-9 or 19-2 1 or a population of 

nucleic acids of any one of claims 15-18 as an antisense, antigene or ribozyme 
or double stranded nucleic acid therapeutic agent, or in therapeutic use as 
modulating and silencing sense nucleic acid agents. 

158. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 

10 nucleic acids of any one of claims 15-18 as an antisense, antigene or ribozyme 

or double stranded nucleic acid therapeutic agent, in which the said nucleic acid 
or population of nucleic acids hybridize to specific splice isofoms or isoforms. 

159. Use of a nucleic acid or a population of nucleic acids of any one of claims 1-9 
or 19-21 or of claims 15-18 as an antisense, antigene or ribozyme or double 

15 stranded nucleic acid therapeutic agent, in which the said nucleic acid or 

population of nucleic acids hybridize to a non-coding antisense RNA orRNAs. 

160. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 in a diagnostic kit, based on detection 
of diagnostic splice isoforms or splice patterns, or diagnostic nucleic acids, such 

20 as viral or bacterial mRNAs. 

161. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 in a diagnostic rnicroarray based on 
detection of mRN A signatures or splice isoform signatures in the concentration 
range of 10000 ppm to 1000 ppm. 

25 162. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 in a diagnostic rnicroarray based on 
detection of mRNA signatures or splice isoform signatures in the concentration 
range of 1000 ppm to 100 ppm. 

163. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
30 nucleic acids of any one of claims 15-18 in a diagnostic rnicroarray based on 

detection of mRNA signatures or splice isoform signatures in the concentration 
range of 100 ppm to 10 ppm. 

164. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 in a diagnostic rnicroarray based on 
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detection of mRNA signatures or splice isoform signatures in the concentration 
range of 10 ppm to 1 ppm, 

165. Use of a nucleic acid of any one of claims 1-9 or 19-21 or a population of 
nucleic acids of any one of claims 15-18 in a diagnostic microarray based on 

5 detection of mRNA signatures or splice isoform signatures. 

166. Use of a non-naturally occurring nucleic acid of any one of claims 1-9 or 19-21 
or use of a population of non-naturally occurring nucleic acids of any one of 
claims 15-18 in a method for generating an array of said non-naturally occurring 
nucleic acids of chosen lengths within discrete locations of a support material. 

10 167. Use according to claim 166, where said non-naturally occurring nucleic acids 
are LNA oligonucleotides. 

168. Use according to any one of claims 166-167, where said non-naturally occurring 
nucleic acids are LNA oligonucleotides containing a photochemically active 
group at the 5' end or the 3' end, separated by a linker from the LNA 

1 5 oli gonuc leotides . 

169. Use according to any one of claims 166-168, wherein said linker is a 
hexaethylene monomer, dimer , trimer, tetramer, pentamer or hexamer, or a 
poly-T sequence of 10-50 nucleotides in length, or a poly-C sequence of 10-50 
nucleotides in length. 

20 170. Use according to any one of claims 168-169 for generating an array comprising 
the steps of a) selecting said nucleic acid or population of nucleic acids 
according to the method of claim 146, b) synthesis of the said nucleic acids 
using phoshoramidite chemistry and DNA and LNA prhosphoramidites; c) 
purification of the said nucleic acids; d) printing of the said nucleic acids of 

25 chosen lengths and sequence onto a polymer surface; and e) coupling of said 

nucleic acids via excitation of the photochemically active group at the 5' end or 
the 3' end covalently onto the polymer surface using UV light. 

171. The method of claims 166-170 wherein a computer-controlled microarray 
printing robot delivers the said nucleic acids onto discrete locations on the 

30 polymer surface. 

172. The method of claims 166-171, wherein the size of each discrete location is 
between an average size of 10 and 150 microns. 

173. The method of claims 166-172, wherein said nucleic acids are printed at a 
density of at least 50 nucleic acids per square cm. 


LNA21/ SKA/MSL 


5/19/2003 


238 

174. The method of claims 166-172, wherein said nucleic acids are printed at a 
density of at least 100 nucleic acids per square cm. 

175. The method of claims 166-172, wherein said nucleic acids are printed at a 
density of at least 500 nucleic acids per square cm. 

5 176. The method of claims 166-172, wherein said nucleic acids are printed at a 
density of at least 1000 nucleic acids per square cm. 

177. The method of claims 166-172, wherein said nucleic acids are printed at a 
density of at least 2500 nucleic acids per square cm. 

178. The method of claims 166-172, wherein said nucleic acids are printed at a 
10 density of at least 4000 nucleic acids per square cm. 
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OLIGONUCLEOTIDES USEFUL FOR DETECTING AND ANALYZING NUCLEIC ACIDS 
OF INTEREST 


Abstract of the Disclosure 
5 The invention features improved nucleic acids and methods for expression profiling of mRNAs, 
identifying and profiling of particular mRNA splice variants, and detecting mutations, 
deletions, or duplications of particular exons or other splice variants, e.g., alterations associated 
with a disease such as cancer, in a nucleic acid sample, e.g., a biological sample or a patient 
sample. 
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Melting temperatures (Tm) of the complementary DNA-DNA and LNA-DNA duplexes. 8 
Modified monomers (LNA are in CAPITALS): I = inosine; D = 2,6-diaminopurine; X « 2- 
anninopurine. 


Entry 

Oligonucleotide 

Tm (± 0.5 °C) of the duplexes with complementary j 
deoxynucleotide J 

structure 

3'-ctgtatcc 

3'-ctgaatcc 

3'-ctggatcc 

3'-ctgcatcc 1 

1 

5"-gacalagg 

23.8 

<10 

<10 V 

<10 1 

2 

5-gacttagg 

<10 

22.6 

<10 

<10 1 

3 

S'-gacgtagg 

<10 

<10 

<10 

25.0 1 

4 

5-gacdtagg 

233 

<10 

<10 

<10 1 

5 

5'-gdcdtdgg 

33.4 

<10 

<10 

17.7 

6 

5*-gachagg 

<10 

<10 

<10 

20.9 

7 

S'-gacxtagg 

<10 

<10 

<10 

<10 

8 

5'-GACATAGG 

61.6 

38.2 

43.4 

40.6 

9 

S'-GACTTAGG 

28.0 

60.7 

36.4 

23.5 

10 

S'-GACGTAGG 

55.0 

32 b 

41 b 

70.9 

11 

5'-GACDTAGG 

67.8 

42.2 

41.4 

52.4 

12 

5'-GDCDTDGG 

783 

55.9 

54.7 

63.8 

13 

5'-GACITAGG 

53.1 

48.2 

43.0 

59.9 

14 

S'-GACXTAGG 

60.8 

45.5 

44.0 

53.9 


B The melting temperatures (Tm values) were obtained as a maxima of the first derivatives of the 
corresponding melting curves (optical density at 260 nm vesus temperature). Concentration of the 
duplexes: 2.5 \*M. Buffer: 0.1 M NaCl; 10 mM Na-phosphate (pH 7.0); 1 mM EDTA. 
b Low cooperativity of transitions (accuracy ± 1 °C). 
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wherein 

X is N or CH; 

Y is O or S; 

Z is OH or CH 3 ; 

R is H f F f or OR 3> where R a is H, C g .« alkyl or allyl. and 
R, is C, m4 alkyl, C, . 4 alkoxy, C, m4 alkylthio, F f or NHR 3 where R 3 
is H, or C,_ 4 alkyl, and where the 8 position of the purine, the 3 
position of the pyiiazolopyrimidinc or the 5 position of the 
pyTTOlopyrimidine optionally serve as a point of attachment for a 
cross-linking function or a reporter group. 
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(iv) 

wherein 

Y is O or S; 

Z is OH or CH,; 

R is H, F, or OR,, where R, is H, C, . , alkyl or allyl, and 

R« is H, C, _ e alkyl, C,.« alkenyl. C,.» alkynyl. or optionally the 

5-position of the pyrimidine serves as a point of attachment for a 

cross-linking function or a reporter group. 
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wherein 

X is N or CH; 
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Y is O or S; 
Z is OH or CH 3 ; 

R is H, F t or OR 2 , where R a is H, C Xmt alkyl or allyl, and 
R, is H, C tm4 alkyl, C,. 4 alkoxy, Q., alkyltbio, F, or NHR, where 
F 3 is H, or C, . < alkyl, and where the 8 position of the purine, the 3 
position of the pyrrazolopyrimidine or the 5 position of the 
pyrrolopyrimidine optionally serve as a point of attachment for a 
cross-linking function or a reporter group. 
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(ix) 


wherein 

Y is O or S; 

Z is OH or CH,; 

R is H f F, or OR 2 , where R, is H, C lut alky! or allyl, 

R4 is H t C t . ft alkyl, C^ mt alkenyl, alkynyl, or optionally the 

5-position of the pyrimidine serves as a point of attachment for a 

cross-linking function or a reporter group; 


Z x is O or NH, and 
is H, or C x - « alkyl. 


Figure 8 
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Recombinant splice variant »1 
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(uoxa jaBjej) 
saqojd VN1 


EQ NO Ollgo Name 

8253 Menkes.02 50NH2C6-DNA 

8254 Menkes. 02 50NH2C6-4.LNA 
6255 Menkes.02 50NK2C6-3.LNA 
8256 Menkes.02 50NH2C6-2. LN A 

8258 Menkes.04 50NH2C6-DNA 

8259 Menkes.04 50NH2C6-4.LNA 

8260 Menkes.04 50NH2C6-3.LNA 

8261 Menkes.04 50NH2C6-2.LNA 

8263Menkes.06 50NH2C6-DNA 
8264 Monkes.06 50NH2C6-4.LNA 
6265 Menkes.06 50NH2C6-3.LNA 
8266 Menkes.06 50NH2C6-2.LNA 

8268 Menkes.08 50NH2C6-ONA 

8269 Menkes.08 50NH2C6-4.LNA 

8270 Menkes.08 50NH2C6-3.LNA 

8271 Menkes.08 50NH2C6-2-LNA 

8273 Menkes. 1 0 50NH2C6-DNA 
6274 Menkes. 10 50NH2C6-4.LNA 
B275 Menkes.\0 50 NH2C6 -3.LN A 
8276 Menkes.10 S0NH2C6-2.LNA 

8278 Menkes. 1 2 50NH2C6-DMA 

8279 Menkes. 1 2 50NH2C6-4.LNA 

8280 Menkes. 12 50NH2C6-3.LNA 

8281 Menkes.12 S0NH2C6-2.LNA 

8283 Menkes.14 S0NH2C6-DNA 

6284 Menkes. t4 50NK2C6-4.LNA 

6285 Menkes. 14 50NH2C6-3.LMA 

6286 Menkes.14 50NH2C6-2.LNA 

8288 Menkes. 16 50NH2C6-DNA 

8289 Menkes.1 6 5QNH2C6-4.UM A 

8290 Menkes. 1 6 50NH2C6-3.LNA 

8291 Menkes. 1 6 50NH2C6-2.LNA 
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Sequence 

tctgTtgaGgglAtgamCttgmCaatTcctGtgiTtggAccaTtgaGcaoniCa 
TctGttG agGglAtgActTgcAatTccT gtGttTggAcc AttGagmCagmCa 
TcToTKSaGgGtAlGamCtTgmCaAtTcmaGtGtTlGgAcmCaTtGaGcAgmCa 

agaaaagcaatagaggctctatcaccggggctatatagagttagtatcac 
AgaaAagcAataGaggmCtgtAtcannCcggGgctAtalAGagTiagTatcAc 
AgaAaaGcaAtaGagGctGiaTcaniCcflGBgmCtaTatAgaGtiAgiAtcAc 
AgAaAaGcAaTaGaGgmaGtAtrnCaniGcGoGgmCtAtAtAgAgTlAgTaTcAc 

gctgttalacaacccocaalgalagcagagltcatccgagaacttggatt 
GctgTtatAcaamCtennCaatGamGcagAgttmCaicrnCgagAactTggaTl 
GctGnAiamCaamC ccmCcaAtgAt aGcaG agTtcAtcmCgaGaam CttGgaTt 
GcTgTlALAcAamCcmCcm CaAtGaTaGc Ag AgTlm CaTcm CgAgAamCtTgGaTt 

TcttTggtm CaagAagg AtcgG tea GcaaG team CttaG atcAtaaAcgaGa 

TciTtgGtcAagAagGatiriCggTcaGcaAgtmCacTtaGatmCatAaamCgaGa 

TcTtTgGtmCaAgAaGgAimCoGtmCaGcAaGtmCamClTaGaTcAtAaAcGaGa 

ttataaagcactgaagcataaga&agcaaatatggacgtactgattgtgc 
TtatAaagmCaetGaagmCataAgacAgcaAataTggarnCgtaniCtgaTtgtGc 
TtaTaaAgcAclG aaGcaTaaG acAgc AaaT atG gamCgtAtf GatT gtGc 
TtAfAaAgmCam CtGaAgmCaTaAgAcAgm CaAaTaTgGamCgTamCtGaTtGtGc 

aacaagtggatgtggaacttgtaeaacgtggagalatcattaaagtagtt 
AacaAgtgGalgTggaActtGtacAacgTggaGataTcatTaaaGtagTt 
AacAagTggAtgTggAacTtgTacAacGlgGagAtaTcaTlaAagTagTt 
AaniCaAgTgGaTgTg^aAcTtGtAcAamCgTflGaGaTaTcAtTaAaGlAgTt 

ccattgccaccctcttggtatggattglaattggatttcigaattttgaa 
mCcalTgccAcccTcttGgtaTggaTtgtAattGgatTlctGaatTttgAa 
mCcaTtgmCcamCccTclTggTatGgaTtgTaaTtgGatTtcTgaAttTtgAa 
mCcAtTgmCcAemCcTcTtGgTaTgGaTtGtAaTiGgAlTtmCtGaAtTiTgAa 

ggtatttgalaagactggaaccattactcacggaacoccagtggtgaatc 
Gg^TttgAtaaGactGgaamCcatTactrnCacg,GaacmCocaGtggTgaaTc 
GgtAttTgaTaaGacTggAacniCatTacTcaniCggAacmCccAgtGgtGaaTc 
GgTaTlTgAtAaGamCtGgAam CcAlTamCtmCamCgG aAcmCcmCaGtGgTgAaTc 


1 


8293 Menkes. 1 8 50NH2C8-DNA 

8294 Menkes. 1 8 50NH2C6-4.LNA 

8295 Menkes. 18 50NH2C6-3.LNA 
6296 Menkes. 18 50NH2C6-2.LNA 


atlggtaacogggagtggatgattagaaatggtcngtcattaataacga 
AtlgGtaamCcggCSagtGgatGattAoaflAtggTcttGtcaTtaaTaacGa 
AttGgtAacmCggG agTggAl gArtAgaAatGgtmCnG tcAttAatAacGa 
AtTgGlAam CcG gGaGl GgAt GaTlAgAaAtG gTcTtGtmCaTlAaTaAcGa 


8298 Menkes^} 50NH2C6-ONA 

8299 Menkes.20 50NH2C6-4.LNA 

8300 Menkes,20 50NK2C6-3.LNA 
6301 Menkes.20 50NH2C6-2.LNA 


iggcacaggcacagatgtagccangaagcagctgaigtggttttgataa 
TggcAcagGcacAgatGtagmCcatTgaaGcagmCtgaTgtgGtttTgatAa 
TggmCacAggmC^cAgaTglAgcmCtolTgaAgcAgcTgaTgtGgtTttGalAa 
TgGcAcAgGcAcAgAtGtAgmCcAiTgAaGcAgmCtGaToTgGtTtTgAtAa 


EQ No. Oligo Name 

10573 Menkes.01 50NH2C6-2.LNA 

10574 Menkes.03 50NH2C6-2.LNA 

10575 Menkes.05 50NK2C6-2.LNA 

1 0576 Menkes.07 50NH2CS-2.LNA 

10577 Menkes.09 50NH2C6-2.LNA 

10578 Menkes.11 50NH2C6-2.LNA 

10579 Menkes. 13 50NH2C6-2.LNA 
10560 Menkes. 1 4 50NH2C6-2.LNA 

10581 Menkes. 15 50NH2CS-2.LNA 

10582 Menkes.17 50NH2C6-2.LNA 

1 0583 Menkes. 1 9 SON H2CS-2.LN A 

10584 Menkes. 21 50NH2C6-2.LNA 

10585 Menkes.23 SO NH2C6-2.LN A 
lO705Menkes.22 50NH2C6-2.LNA 


Sequence 

GtGamClTcTcmCgAtTgTgTgAgmCtTtGlTgGaGcmCIGcGtAcGtGgAlTt 
TtTlAamCtGamCam CcTlGtTtmCtG amCtGtTamCgGcGtmCam CIGamCtTtGcmCa 
mCaTamCaGgTcAcTgGcAtGaniCtTgmCgmaTcniCtGtGtAgmCaAamCaTtGaAc 
TgAgGgGaAtGamCgTBTgmCcTcmQGcGlAcAlAaAaTaGaGtmCtAgTcTc 
TgTaTtmCcTgTBAtGgGgmCtGaTgAcAtAtAlGaTg^(TaTgGeniCcAcmCa 
AcAtmCaGaGgmClmCtTgmCaAaGtTaAlT imCam ClAcAaGcTamCaGaAgmCaAc 
Tima^iTaAanCaGaAcGgGtmCemaGcTtAlmCtGcGcAamCaniCaTgTtGgAg 
mCcAlTgmCcAcroCcTcTtGgTaTgG aTtGtAaTtGg AlTlmClG aAlTVT g Aa 
GaAamCgAtAaTamCgAtTtG cTlTcm CaAg m CcT cTaTc AcAgTtm CtGtTgm C a . 
AtGaAcAgTcAlniCaAcTTmCgTcTtmCcAjGaTtArt 

GtTcTgAtGamCiGgAgAcAamCaGiAaAaniCaGcTaGaTcTaTlGcTtmCtrnCa 
TgGcAaGlAtTgAcTLAtmCaAg AaAg AcAgTc AaGaG gAiTcGgAtAaAt 
Gcm{^CtAlAaAcTcAcTaitiaGtmCnGaTaAam 

maGgAIGg^aTcTgmCaGcAaTgGcTgmCtTcAtmCtGlTtmaGtAgTamClTt 
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#■ /usr/bin/perl -w 
#$Revision: 1.18 $ 
# 

$$Id: oligod, v 1.18 2002/07/29 06:26:35 nt Exp $ 
# 

#NAME 

ftoligod - microarray oligod design 
# 

SYNOPSIS 


oligod t-blastdb blastdb] -fastadb fastadb] fastaseq 
-length oligolength] 
-mindist mi nimunoligodi stance] 
-maxhits n] 
-min_tm t) 
-max_tm t ] 
-maxoligo n] 
-minscore nj 

-gene_ident__cutof f ident] 
-gene__len__cutof f len] 
-comp | -revconvp | -rev] 
-capitalize -cphase n -cfreq n] 
-verbose] 
-self_scorel 
-par am] parameter file 
-conf] conf igurationf ile 
-lna] 

-matrix] hybridisationmatrix 


ft 
# 
# 
ft 
# 
ft 
ft 
# 
# 
# 
# 
# 
# 
# 
# 
# 
# 
# 
# 

^DESCRIPTION 

ft 1. Blast against blastdb 

# 2. Less than 17 consecutive matching nt 1 s 
ft 3 . Identity less than 60% 

ft 4. Palindrom = SmithWaterman against complement 

# 5. Melting temperature (0 . 88xA/T) + { 1 . 47xG/C> + (X) 

ft [PICK70 script (DeRisi group)] (+X is by Niels 

Tolstrup) 

ft 6. Salt: 

(http://jsll.c hem . wayne . edu / Hy t her / hy therm2ma i n . h tml } 


a. Monovalent cation 1 mol/L 

b. Mg2+ 0 mol/L 

c . Hybridization temperature 37.0- degrees Celcius 

d. Target 2e-7 mol/L 

e. primer le-9 mol/L 
Number of primers to show. 
Pr ime r 1 eng th . 


-maxhits n 

n is the maximal number of alignments to show for each oligo 


-maxoligo n 

n is the maximal number of oligos to suggest for each 
sequence 
# 

# -min_score n 

# xi is the minimal score required for a hit to be included in 
the 

# scoring of a oligo 
# 
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# -mindist n 

# Oligos will not overlap if n = oligolength 

# 0 and 1 allow for all oligos 
# 

5 ft -gene_ident_cutof f ident 

# Mask hits with identity larger than or equal to ident. 
ft Used together with gene_len_cutof f . 

# Default 98 
# 

10 # -gene_len_cutof f len 

ft Mask hits with alignment length longer than or equal to len 

# Used together with gene_ident_cutof f . 

# Default 50 
ft 

15 # -lna 

# If a LNA matrix is used for selfhybridisation this must be 
set. 

# 

# -matrix hybridisationmatrix 

20 # The format of the hybridisation matrix is that used by the 
f asta 

# package. LNA is represented by A=L C=I G=0 and T=U in the 
matrix . 

# 

25 # -capitalize 

# -cphase phase 

# -cfreq freq 

# -end_spike__len len 

# phase is 0 to freq - 1 

30 # freq is the frequence of LNA default is 4 
# 

# -blastdb "dbl <3b2 db3 - 
# 

8 -sense reverse 
3 5 # Uses the reverse complement of the query, use with strand = 
bottom 
# 

# oligod reads the configuration file oligod.conf 
ft 

40 #FILES 

- # -blastal-1- ----- 

# formatdb 

# dan 

# lynx 
45 tt ssearch 

# dyp 
# 

# AUTHOR 

# Copyright Niels Tolstrup 2001,2002 
50 # tolstrup@exiqon.com 

# Exiqon 
# 

use strict; 
5 5 use Getopt: :Long; 

use FindBin qw($Bin) ; 
use lib '$Bin/ . ./lib" ; 
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use LWP :: User Agent; 
use HTTP: : Cookies; 


my $glb; 

$glb->{ dir) = $Bin; 

$glb->{ conf) = "$glb->{ dir) /oligod.conf w ; 

10 $glb->{ tmpdir) = "$glb->{ dir}/!^?"; 

$glb->{ program^name ) = "$glb->{ dir } /oligod" ; 

$glb->{ lynx_cookie) = "$glb->< dir) /lynx_cookies ■ ; 

$glb->< dyp) = "$glb->{ dir) /bin/dyp" ; 

$glb->{ bin) = "/usr/bin"; 

15 $glb~>( formatdb) = -$glb->{ bin) /f 0™^©^" ; 

$glb->{ bias tall} = "$glb->{ bin) /bias tall ■ ; 

$glb->£ ssearch) = "$glb->{ bin) /ssearch" ; 

$glb->{ dan) = "$glb->{ bin) /dan'; 

$glb->{ lynx^env} = "LD_LIBRARY_PATH=/usr/ local /lib 
2 0 $glb->{ lynx) = °$glb->{ lynx_env) $glb-> { bin) /lynx"; 

$glb->{ tm_exiqon_url) = "http : / /armstrong/cgi- 

bin/tmpr edict .cgi" ; 

$glb->{ trrv_url) = • http : //lna-tm. com/ * ; 

$glb->{ ua> = LWP: : User Agent ->new; 

2 5 $glb->{ cookiefn} = M $glb->{ dir) /lwpcookies - txt" ; 

$glb->{ ua)->cookie_jar ( HTTP: :Cookies->new ( file => $glb->{ 

cookiefn) , 

auto save => 1) ) ; 

30 

$glb->{ version) = "0.9"; 
$glb->{ verbose) = 0; 

#$glb->{ view) 

3 5 " BLAST : PASTA : SELF_EVIDENCE : PROBE : RUN__TIME : PARAMETER " . 

■ : SCORE_EVIDENCE B ; 
ft$glb->( view) = 

-BLAST : SELF_EVIDENCE : PROBE : RUN_TIME : SCORE_EVIDENCE - ; 
$glb->{ view) 

40 ' PARAMETER : FASTA : BLAST : SELF_EVIDENCE : PROBE : RUN_TIME w . 

-•-: SC0RE_EVIDENCE: RUNTIME : TARGET^STRUCT" ; 
#$glb->{ view} = ■"; 

45 8###ft#8## XML I/O #»######### 

#use XML: : Dumper ; 

ftsub dump_parameter_xml ( $$) { 

ft my ( $fn. $dat) = @_; 

ft my $dump = new XML : : Dumper ; 
50 ft my $xml = $dump->pl2xml < $dat) ; 

ft local FILE; 

ft open< FILE, "> $fn") || die "Could not open $fn $ ! m ; 
f print FILE B $xn\l\n"; 
ft close FILE; 

55 ft) 
ft 

ftsub read_parameter_xinl < $ ) { 
ft my ( $fn) = e_; 
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# my $xml = 

# local FILE; 

# ©pent FILE, B < $fn") || die "Could not open $fn $ ! " ; 

# while ( <FILE>) { 
5 # $xml .= $_; 

# ) 

# my $dump = new XML: : Dumper ; 

# my $dat = $dump->xml2pl ( $xml) ; 

# close FILE; 
10 ft $dat; 

#) 

*#####&# PARAMETER I/O #*###&#### 

15 sub read_conf ( $ ) { 
my ( $glb) = 
my $ var ; 
local *FILE; 

20 open( FILE, w $glb->{ conf)') || die -Could not open $glb- 
>{conf } n ; 
while (<FILE>) { 
chomp ; - 

if( /~\s*#/ ) { ) 
25 elsif< /~\s*$/ ) { ) 

elsif( /^\s*(\S+)\si-( t"#]+)\s*/ ) { 
if( defined $glb->{ $1}) { 
$glb->{ $1) = $2; 

) 

30 > 

else { 

print "$_ was not recognized\n" ; 

> 

) 

35 close FILE; 

foreach $var ( "tmpdir", "lynx^cookie** ) { 
if( $glb->{ $var} !~ /\//) { 

$glb->{ $var) = -$glb->{ dir}/$glb->< $var) ■ ; 

> 

40 > 
) 


sub read_param($$) { 
45 my ( $param, $file) = 
my $waming = • * ; 
my $tmpl; 
my $ tmp2 ; 

50 while <<$file>) { 

if< /~\s*#/ ) { ) 

elsif{ /~\s*$/ ) ( > 

elsif{ /~\s*<PARAMETER>$/ ) { ) 

elsif( /~\s*<\/PARAMETER>$/ ) { 
55 last; 

} 

elsif( /~\s*oligod__param/i) { 
iff /^\s*oligod_param\s+(\S+)\s+(\S+)\s+(\S+)/i ) { 
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$tmpl = $1; 
$tmp2 = $2; 
$tmpl =- tr/A-z/a-z/; 
$tmp2 =- tr/A-Z/a-z/; 
5 if( defined $param->{oligod_param) { $tmpl>) { 

if( defined $param->(oligod_param} { $tmpl> ( $trop2}) { 
$param->{oligod_paramH $tmpl}{ $tmp2 ) - $3; 

> 

else { 

10 $warning . = "Did not recognize $tmp2 in $_•; 

) 

> 

else { 

$warning .= "Did not recognize $tmpl in $_"; 

15 ) 
) 

else { 

$warning .= "Not enough values fore oligoo^param in $_" ; 

} 

20 } 

elsif( /' v \s*blast_param/i J { 
if( /~\s*blast_param\s+(\S+)\s+(.-i-)/i > { 
$tmpl = $1; 
$tmpl =~ tr/A-Z/a-z/; 
25 if( defined $param->{ blast_param}{ $tmpl> ) { 

$param->( blast_param} { $tmpl} = $2; 

) 

else { 

$warning .= "Did not recognize $tmpl in $_" ; 

3 0 ) 
> 

else { 

$ warning .= "Not enough values fore blast_pararo in $_° ; 

) 

35 > 

elsi£< /*\s*<\S+)\s+([~#]+>/ ) { 
$tmpl = $1; 
$tmpl =- tr/A-Z/a-z/; 
if( defined $parain->{ $tmpl) ) { 
40 $tmp2 = $2; 

-$tmp2 =- s/'M.*?)-\s*$/$l/; • 
$param->( $tmpl) = $tmp2; 

> 

else { 

45 $warning .= "Did not recognize $tmpl in $_" ; 

) 

) 

else { 

$warning .= "Key value pair expected, line not recogniced 

50 

) 

) 

$ warning; 

) 

55 


sub update_derived_param($) { 
my ($param) = @_; 
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foreach ( keys %{$param->( oligod_param) ) ) { 
$param->{ oligod_param) { $_) ( squash_f actor) = squash.factor ( 
$param->( oligod__param) { $_} { scruash_dx) , 
5 $param->{ oligod_jparam) { $_) { squash_dy } ) ; 

} 

> 


10 sub initialize_param($) { 
my ($popt) = 
my $mat; 

# Parameter priority 

15 # 1. Commandline parameters. 

# 2. Parameters from parameter file. 

# 3. Default values. 


20 


25 


30 


35 


40 


my $param = { 

■fastadb" 


scoring 


strands" 


"blastdb", 
"oligo_length" , 
*oligo_sense" , 
" cphase ■ , 0 , 

"cfreq", 3, 
n end^spike_len n , 
"mindist " , 
■maxhits " . 
"max_noligo" , 10 , 

■min_score" , 0, # 3 5- ignore < 18 matches in oligo 


■/home/nt/database/bt7h_sapiens" 
50, 

-direct" . 


0, 

50, 

8, 


"dnaconc", 2000, # nMol 

"saltconc", 115, # mMol 
"gene_len__cutof f " , 50, 
■gene_ident_cutof f tt , 98, 


"blast_param B , 


■lna" 
); 


("wordlen", 11, "strand", "both 
'expect", 50, "nproc", 2, "filter" 


$param->{ oligod_param) = ( 
45 "self .match", { 'weight-, 1, 

"cutoff*, 50, 
" squash__dx" , 10, 
«'squash_dy* , 0.9, 
■ squash_f actor" , 1 , 
50 "layer", 1, 

), 

"self_hyp", { ■ weight h , 0, 
"cutoff", 50, 
" squash_dx" , 10, 
55 ■ squash^dy" , 0.9, 

■ squashy factor" , 1, 
"layer", 1, 
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"tm^jnin" , < 'weight", 1 

•cutoff", 10, 
■ squash_dx ■ , 2 , 
• squash_dy " , 0.9, 
5 °squash_f actor" , 1, 

"layer", 0, 
). 

*tm_max", { •weight*. 1 

"cutoff", 100, 
10 "squash_dx". 2, 

"squash_dy" , 0.9, 
"squash_f actor" ,1, 
■ layer " , 0 , 
>, 

15 "tin", { "weight-, 1 

■cutoff", 1, . 
■ squash_dx " , 0.8, 
■ squash_dy " , 0.9, 
■ squash_f ac tor ■ , 1 , 

20 "layer" , 1, 

}, 

• tzn_dan_min ■ , { " wei ght ° , 1 
"cutoff" , 10. 
"squash_dx" , 2, 
25 "squash.dy", 0.9. 

"squash_f actor" , 1, 
"layer", 0, 
>. 

" tm_dan_max" , { "weight", 1 
30 "cutoff", 100, 

1 squash_dx* , 2 , 

■ squasJ-^dy" , 0.9, 
"squash_f actor" , 1, 
■layer", 0, 

35 >, 

"tnudan", { "weight", 1 

"cutoff", 1, 
n scjuash_dx p ( 0.8, 

■ squash_dy" , 0.9, 
40 "sc[uash_f actor " , 1 , 

• "layer - , 1 , 
>, 

n hit_score n , { "weight", 0 
•cutoff" , 10000, 
45 ■squash_dx ,t , 5000, 

■ squash_dy B , 0.9, 
■ squash_f actor " , 1 , 
■layer", 1, 
>. 

50 "target_struct" , { "weight", 1 

■cutoff", 30, 
"squash_dx" , 20, 
"squash_dy" , 0.9, 
"squash_f actor" , 1, 

55 "layer", 1, 

>, 

■max^match", { "weight", 1 
"cutoff", 30, 
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"squashy dx" , 5, 
"squash^dy-, 0.9, 
"squash_£ actor" , 1, 
-layer", 1, 
5 ), 

"max_stretch a , { "weight", 1, 
■cutoff" , 20, 
"sccuash_dx n , 2, 
"squash_dy" , 0.9, 
10 "squash_f actor" , 1, 

"layer", 1, 
>. 

"ggg", { "weight-, 0, 

-cutoff" , 3 , 
15 " squash_dx" , 1 , 

"squash^y" , 0.9, 
" squash_f actor 9 , 1, 
"layer", 1, 
), 

20 -eg", { "weight", 0, 

"cutoff", 1, 
"squash.dx", 1, 
° squash_dy ■ , 0.9, 
"squashy factor" , 1, 

25 -layer", 1, 

>, 


3 0 if( $popt->( paramfn) ) { 
local *FILE; 

open ( FILE, "< $popt~>{ paramfn}") || 

die "Could not open $popt->( paramfn} $!' 
print read__param( Sparaxn, \*FILE); 
3 5 close FILE; 

) 


foreach < n oligo_length" , ■ oligo_sense ■ , "mindist", ■maxhits" 
"max_noligo" , 

40 -cphase", "cfreq H , " end_spike_len" , "min_score" , "lna" 

"matrix", 

-fastadb" , "blastdb" , "gene_ident_cutof f ■ , 
■ gene. len_cutof f " ) { 

if ( defined $popt->( $_} ) { 
45 $param->{ $_} = $popt->{ $_} ; 

) 

} 

if( defined $popt->{ oligod_param_ggg__weight) ) { 
$param->{ oligod_param){ ggg) { weight) = $popt->{ 
50 oligod_param_ggg_weight) ; 
} 

if( defined $popt->{ oligod, param eg ..weight} ) { 
$param->{ oligod_param) { eg} { weight) = $popt->{ 
oligod__param_cg_weight) ; 
55 ) 

if ( defined $popt->{ oligod_param_tm_min) ) { 
$param->{ oligod_param){ tm_min} { cutoff} =s 

$popt->{ oligod_pararrL.tm_min) ; 


9 


} 

if( defined $popt->( oligodjaram_tm_jmax) ) { 
$param->{ oligod_param} { tit\_itiax) { cutoff) = 

$pop t- > { o 1 igod_param_tm_max ) ; 

5 } 

if( defined $popt->{ oligod jaram_tm_dan_min) ) ( 
$param->{ oligod_param}{ tm_dan__minH cutoff) = 

$popt->{ oligod_j?aram_tm_dan__min) ; 

) 

10 ifC defined $popt->{ oligodLparam_tm_dan_raax) } { 
$param->( oligod_param) { tm_dan_max) ( cutoff) = 

$popt->{ oligod_param_tnudan_max) ; 

> 

15 if( $param->{ matrix) ) { 

if ( ! ~e $param->{ matrix)) { 
die "Did not find $param->{ matrix} $!"; 

) 

) 

20 else ( 

if( Sparam->( lna) ) ( $param->{ matrix) = '$glb->{ 

dir}/lna.itiat"; > 

else { $param->{ matrix) = "$glb->{ 

dir)/dna.mat" ; ) 
25 if( ! -e $param->{ matrix}) { 

if( $param->{ lna} ) { $mat = default_mat( "lna"}; } 
else { $mat = default_mat{ "dna*>; } 

write_file{ $mat, $param->{ matrix}); 

) 

30 } 


update_derived_param( $param) ; 
$pararn; 

) 


40 


sub parse_argv($) { 
my ( $glb) = @_; 
my $opt; 
my $popt; 


GetOptions { "param:s° 
"conf : s a 
* f astadb: s" 
45 "blastdb:s" 

" length :i" => \ 
"sense :i n => \ 

■mindist : i B 
"cphase : i" 
50 -cfrcqri" 

■ end^spike^ len : 
,, min_tm_dan : i " 
o ligod par am tm dan min } , 

"max_tm_dan : i " 
55 oligod_parain_tnudan_max} , 
°min_tm: i n 
"max_tm: i w 
"min_score : i" 


=> \$popt->{ paramfn}, 

=> \$glb->{ conf), 

=> \$popt->{ fastadb), 
=> \$popt->< blastdb), 
$popt->{ oligo_length} , 
$popt->( oligo_sense) t 

=> \$popt->{ mindist }, 

=> \$popt->{ cphase), 

= > \$popt->{ cf req) , 

i" => \$popt->{ end_spike_len) , 
=> \$popt->{ 

=> \$popt->{ 

=> \$popt->{ oligod_param_tm_min} , 
=> \$popt->{ oligod_paran\_tm_inax} , 
=> \$popt->{ min_score}, 
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m rnaxhits:i m => \$popt->{ maxhits}, 
"maxoligo: i ■ s> \$popt T >{ max_nol i go } , 
n lna:i B => \$popt->( lna), 

"gene^ident^cutof f : i ■ => \$popt->( 
gene_ident_cutof f ) , 

■matrix:s tt => \$popt->{ matrix). 


"rev* 

= > 

\$opt->{ 

rev} , 

■comp" 

=> 

\$opt->{ 

comp} , 

" revcomp" 

= > 

\$opt->{ 

revcomp) , 

"capitalize* 

-> 

\$opt->{ 

capitalize} , 

"verbose* B 

=> 

\$glb->{ 

verbose } . 

"view: s* 

=> 

\$glb->{ 

view} , 


=> \$opt->{ stdio} 
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if ( $opt->{ stdio} 
push 6ARGV, 


) ( 


20 if ( $#ARGV as -1) { 

usage ( $glb, "No sequence file"), 

} 


25 } 


($opt, $popt); 


sub dump_param { $ $ ) { 
my ( $glb, $param) = 
30 my $key; 

my $property; 
my $ value; 

my $str = " cPARAMETER>\n" ; 

35 fo reach $key ( sort keys %$param) { 
if ( $key eq "oligod_param" ) { 
foreach $property ( sort keys %{$param->{ $key> } > { 
foreach $ value ( sort keys %<$param->{ $key} { $property} } ) 

( 

40 $str .= sprintff *%-16s %-16s %-16s %g\n" , $key. $property, 

$ value, 

$param->{ $key} { $property} { $ value} ) ; 

} 

} 

45 } 

elsif ( $key eq "blast_param" ) { 
foreach $property ( sort keys %{$param->{ $key}}){ 

$str .= sprintfl "%-16s %-16s %s\n - , $key, Sproperty, 
$pararo->{ $key} { $property} ) ; 

50 } 

} 

else { 

$str .:= sprintf ( p %-16s %s\n" , $key, $param->{ $key>); 

} 

55 } 

$str -</PARAMETER>\n" ; 

$str; 
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) 


######&# SEQUENCE I/O ######### 

5 

sub print_f asta ($$> { 
my ($seq, $£ilept) = 6_; 
my $sq = $seq->{ seq} ; 
my $str; 

10 

print $filept ">$seq->{ name)"; 
i£< defined $seq->< comment) ) { 

$str = substr< $seq->{ comment}, 0, 55) 

print $£ilept ■ $str"; 

15 > 

print $filept "\n"; 

i£( defined $sq ) { 
$sq =~ tr/A-Za-z//cd; 

20 

while < $sq =- /(.(60})/gc ) { 
print $filept "$l\n"; 

> 

if ( $sq =- /(.+)/gc ) { 

25 print $filept "$l\n"; 

) 

) 

else { 
print $£ilept "\n"; 

30 > 

} 

sub read_f asta ( $$) { 
my < $lastline, $filept) = 
3 5 my $seq; 

if{ !$$lastline) { 
while ( <$£ilept>) < 
i£< /~>/ ) { 
40 $$lastline - 

last; 

} 

> 

} 

45 if( $$lastline> ( 

if( $$lastline *- />(\S+)\s*(\S*) / ) ( 
$seq->{ name) = $1; 

$seq->{ comment) = $2; 

$seq->{ seq> = ■■; 

50 $$lastline = •"; 

while ( <$filept> ) { 
if( /">/ ) ( 
$$lastline = $_; 
$seq->( seq) =- tr/a-zA-Z/ /cd; 
55 return $seq; 

) 

$seq-;>{ seq) . = 

) 
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> 

else { 
$seq->{ name) = 
$seq->{ comment) = ■"; 

5 $seq->{ seq) = ■"; 

) 

) 

i£( $seq ) { 

$seq->( seq) =~ tr/a-zA-Z//cd; 

10 } 

$seq; 

) 


15 sub print_fasta_fn{ $$) { 
my ($seq, $fn) = 0_; 
local *FILE; 

open( FILE, u >$fn') || die "Could not open $fn $! 
2 0 print_fasta< $seq, \*FILE); 
close FILE; 

) 


25 sub write_file< $$) { 
my ($str, $£n) = 0_; 
local *FILE; 

open( FILE, ">$fn" ) \\ die "Could not open $fn $1 
3 0 print FILE $str; 
close FILE; 

) 

sub read_file< $) < 

3 5 my ($fn) = @_; 

local *FILE; 
my $str = n " ; 

open( FILE, • , <$fn t ') || die "Could not open $fn $! 

4 0 while ( <FILE>) { 

$str ;= $_; 

) 

Close FILE; 
Sstrj 

45 ) 


BLAST ######&#ft#ft####8## 

50 sub £ ormatblastdb($$$) { 

my ( $glb, $dbfn, $type) = @_; 

my $db = $dbfn; 

my Ssize; 

my $mtime; 
55 my $prev_size= 0; 

my $prev_mtime = 0; 

local *FILE ; 

my $d; 
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$db s/(.*)\. M*-]*)$/$l/; # Remove suffix 
$db s/.*\/( [*\/] *) ft Remove path 

5 print "Do I need to format $db?\n" ; 

($size, $mtime) = stat $dbfn; 
if( -e -$glb->( tmpdir > /$db. stat" ) ( 
openf FILE, B <$glb->{ tmpdir ) / $db . stat ■ J || 
10 die "Could not open $glb->{ tmpdir )/ $db . stat $ ! M ; 

while ( <FILE>> { 
if( /size=\s* <\d+) . *mtime=\s* (\d+) /) { 
$prev_size= $1; 
$prev_mtime = $2; 

15 } 

) 

Close FILE; 

} 

20 if( r _ e »$glb->{ tmpdir)/$db" || $prev_size != $size || 
$prev_mtiroe < $mtime ) { 
print - YES\n"; 

$d = -$glb->{ tmpdir )/$db"; 
if ( -1 $d ) { 
2 5 unlink $d; 

} 

if( $dbfn =- /-\// ) < 
$d = $dbfn; 

) 

30 else { 

my $cwd; 

Chomp ( $cwd = *pwd" > ; 
$d = "$cwd/$dbfn" ; 

} 

35 print 'In -s $d $glb->( tmpdir) /$db\- 

print 'cd $glb->{ tmpdir); $glb->{ formatdb) -t $db -i $db -p 
$type % ; 

print "echo "size= $size mtime= $mtime" > $glb->{ 
tmpdir) /$db. stat* ; 
40 ) 

else ■ - ■- ■— 

print " No, The database is ok.\n"; 

} 

$db = B $glb->{ tmpdir) /$db- ; 
45 $db; 


sub blast($$$$$) { 
my ( $glb, $queryseq, $dbfn, $method, $parameters) = 
50 my $queryfn = "$glb->{ tmpdir) /$$. fasta" ; 
local *FILE; 
local *PIPE; 
my $result = " ■ ; 

55 open{ FILE, " >$queryfn" ) || die "Could not open $queryfn $ ! " ; 
print_fasta( Squeryseq, \*FILE) ; 
Close FILE; 
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#print -$glb->{ bias tall) $parameters -p $method -d $dbfn ~i 
$ query fn\n* / 
open( PIPE, 

qq($glb->{ bias tall) $parameters -p $method -d ^dben" -i 
5 $queryf n | ) ) | | 

die "Failed to run $glb->{ blastall)*; 
while ( <PIPE>> { 
$result .= 

) 

10 close PIPE; 

unlink $queryfn; 
$ re suit; 

15 ) 


sub parse Jolast ( $ ) < 

my ($blastout) = 
20 ray $ query = q{Score =\s* { \d+? . *?} . 

q{Expect =As* ( ( \deE\- . ]* ) . * ? > . 
q{\U\d+)%N) .*?} . 
q{Strand = (Plus)Minus) \J ( Plus [Minus ).*?} . 
<!{< (Query: \a* { \d+) \s+ > ( [~ 3 + )r>%)*> . 
25 d( <\d+> [~S}+Sb;jctI~>%) *) \n\n[^>S] *) ; 

my %result = { ) 

my $ expect; 

my $name; 

my $ score ,- 
30 my $ident; 

my $qstrand; 

my $tstrand; 

my Sqstart; 

my $qend; 
35 my $tstart; 

my $tend; 

my $alignment; 

my $blastposl; 

my $blastpos2; 
40 my $1; 

my $sl ; 

my $b; 

my $queryseq = 
my $matchseq = * • ; 
45 my $targetseq* - 0 ; 
my $target; 
my $target_no= 0; 
my $qlen = 0 ; 

50 if( $blastout /~Query=\s* (\S+) . *?\ ( < \d+ ) \s*letters\) /sm > { 
$result{ query_name) = $1; 
$result{ qlen} = $2; 

) 

else { 

55 $result{ query_name) = 

$resultf <jlen) = 0; 

} 

$result{ hits) = (); 
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while I $blastout =- / r > t\S+) I" >) * ) /smgc ) ( 
$target = $1; 

$naroe = $2; 

$target_no++ ; 
$blastposl = pos; 

while ( $target =- /$query/smgc ) { 
#print "$2, $3, $4\n" ; 


$tstart 


* > * 

$score 


= $1; 

$expect 


= $2; 

$ident 


= $3; 

$qstrand 


$4; 

$tstrand 


$5; 

$alignment 


$6; 

$1 


$7; 

$qstart 


$8; 

$b 


$9; 

$qend = $10; 


$blastpos2 


pos; 


$queryseq = ■ ■ ; 

$matchseq = B ■ ; 

$targetaeq = ""; 

$1 = length { $1) ; 

$sl = $1 - 1 ength(" Query: ") ; 

$b = length ( $b) ; 

while ( $alignment = - /"Query :. {$sl} (. (1, $b) ) \ \d+\n. {$1} 

<.{l,$b))\nSbjct: <.{$sl)) (.{l,$b})\ (\d+> Wmgex) 

{ 

$queryseq .= $1; 
$matchseq . = $2; 
$ target seq •= $4; 
if ( $tstart eq ' ' ) { 
$tstart = $3; 

> 

$tend = $5; 

> 

#print w \nl: $queryseq\n2 : $matchseq\n3 : $targetseq\n n ; 

Sexpect =- s/^e/le/; 
Sqlen = $qend - $qstart + 1? 
push @{$result{ hits}}, 
( "tname', $name, "score", $score, "expect", $expect f 
"ident" , $ident, 

"qstrand", $qstrand. "tstrand", $tstrand, » target_.no " , 
$target_no, 

-qstart", $qstart, "qend", Sqend, "qlen", $glen, 
-tstart", $tstart, "tend", $tend, 
"self, 0, 

"queryseq", Squeryseq, ■matchseq" , $matchseq, 
■ cargetseq" , $targetseq) ; 
pos = $blastpos2; 

> 

pos = $blastposl; 

> 

% result; 
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) 


###»# perceptron ##### 

5 

sub squash_f actor (S$) < 

# This factor determens how steep the squashing function 
operates 

# a change of $x from 0 gives a $y response if used with $k 
10 my {$x, $y> = 

my $Jc = - log(l/$y - 1) / $x; 
$k; 

} 

15 sub squash ($){ 
my ($x) = 

my $y - 1/ ( 1+exp < ~$x) ) ; 

$y; 

) 

20 

sub dot ($$) { 
my ($a, $b) = @_; 
my $sum = 0; 
my $i; 

25 

if < $#$a != $#$b) { 
die "dot: a ($#$a) and b ($#$b) should have the same number o 
elements" ; 
) 

30 

for< $i=0; $i<=$#$a; $i++) { 

#if( ! defined $a-> [ $i] ) { die "OHOH a $i was not defined" 

) 

#if ( ■ defined $b->[ $ij ) { die "OHOH b $i was not defined" 

35 } 

$sum += $a->[ $i] * $b->[ $i]; 

} 

$sum; 

40 } 

sub sural $ ) { 
my ($a) = @_; 
my $sum = 0; 
45 my $i; 

for( $i=0; $i<=$#$a; $i++) { 
$sum += $a->[ $i] ; 

> 

50 

$sum; 

} 

sub perceptron ($$) { 
55 my ($weight # $input) = @_; 
my $sum = sum( $weight) ; 

8my $output = squash( dot { $weight, $input)); 
my $output = dot ( $weight, $input>/ $sum; 
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Soutput; 

) 


5 

SYSTEM 

sub openfile($) { 
my <$fn ) = 
10 local *FILE; 

open{ FILE, $fn) | | 

die "Could not open $fn $!\n" ; 

♦FILE; 

15 ) 


sub makedir ($) { 
my ($dir) = 
20 my @dir = split (/\//, $dir) ; 
my Sd = 

foreach (@dir) { 
$d $_; 

25 if ( $d && ! -e $d ) { mkdir $d; } 

$d .= V; 

> 

> 

30 

sub usage ($$) { 
my ($glb, $message) = @_; 

my $£ile = open£ile( $glb->( progranuname)) ; 

3 5 $message .= "\n"; 

while (<$file>) { 
if( ) ( 

$message . = "$l\n"; 

) 

40 else { 

' 'last; 
> 

) 

close $f ile; 
45 die "$message"; 
} 

sub min(@} { 
my $min = 99999999999; 
50 foreach (@_) { if ( $_ < $min ) { $min = $_; > } 
$min; 

} 

sub xnax{6) ( 
55 my $max = -99999999999; 

foreach (@_) { i£( $_ > $max ) { $max = $_; } ) 
$max; 

) 
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sub up ( $ ) { 
my { $str) = e_; 
$str =- tr/a-z/A-Z/; 
5 $str; 
) 

SEQUENCE FUNCTIONS 

1 0 sub comp ( $ ) { 

my ($seq) » @_; 
$seq 

tr/acgtuxrurwsykvho^xxiACGTUMRW 
15 RMBDHVXX / ; 

$seq; 

} 

20 sub rev($) { 

my ($seq) = @_; 

$seq = reverse ( split! " w , $seq) ) ; 
$seq; 

25 ) 

sub revcomp($| { 
my ($seq) = @_; 
$seq = rev( $seq} ; 
30 $seq = comp( Sseq) ; 
$seq; 

) 

sub capitalize ($$$$) { 
35 my ( $seq, $phase 4 $freq, $end_spike_ len) = ©_; 
my $pat = p , " x $phase; 
my $n = $freq - $phase - 1; 

$seq =- tr/A-Z/a-z/; 

40 

- if( $freq != 0 > { ™ 

if( $phase < 0 || $phase >= $freq J { 
die "phase ($phase) is out of range \ n " ; 

} 

45 if ( $freq < 1 ) { 

die "freq ($freq) is out of range\n" ; 

> 

$seq = ~ s/($pat) (.) (.{0.$n})/$l . upC$2) . $3/eg; 

50 } 

if{ $end_spike_len != 0 ) { 
$seq =- s/ {. (0, $end_spike_len) )(.*)(. { $end_spike_len) ) /up ($1) 
. $2 . up ($3 > /eg; 
) 

55 

$seq; 

) 
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######### MELTING TEMPERATURE ###### ftftff ## »####### 

sub melt_temp ( $ ) ( 
my ($3eq) = @_; 
5 my Steinp; 
my $n_at; 
my $n__gc ; 
my $n_ot ; 

10 $seq =- tr/a-zA-Z//cd; 

$_ = $seq; 

$n_at = tr/aAtT//d; 
$_ = $seq; 
15 $n_gc = tr/gGcC//d; 
$_ = $seq; 

$n_ot = tr/aAtTgGcC//cd; 

$temp = (0.88 * $n_at) + (1.47 * $n_gc) + $n_ot; 

20 

($n_at f $n_gc , $n_ot, $temp) ; 

) 

sub tm_pred<$) { 
25 my ($seq) = 8_; 
my $temp =0/ 
my $conf = 0; 


30 i£(l){ 

$seq =- tr/acgtACGT//cd; 


#01igoconc is the sum of target and probe concentrations molar 

35 

my $ output = 

*$glb->{ lynx) -cookie_f ile=$glb-> { lynx_cookie) -source $glb- 

>C 

40 tm_exiqon_url } ?seguences= 1 $seq* &saltconc = ' 0 . 115 ' &oligoconc= 1 0 . 00 
— 0002'*; 

print w $glb->{ lynx) -cookie__f ile=$glb->{ lynx_cooJcie) -source 
$glb->( tm_exiqon_url )? sequences = ' $seq' \n" ; 
45 print " $output\n" ; 

ifC $output = ~ 
/<td>( (\d. ]+><\/td>\s-Ktd>( [\d."J+><\/td><\/tr>/s> { 
$temp = $1; 
50 $con£ = $2; 

) 

) 

l$conf , $temp) ; 

55 ) 


sub tm($$@) { 


20 


my <$glb, $param, ®seq) = 9_; 
my $temp - 99999; 
my $con£ = 0; 
my $comand; 

5 my $seqfn = *$glb->( tmpdir >/${$> tm. seq" ; 

my $resultfn = -$glb->{ tmpdir) /${$)tm.daf ; 

local *FILE; 

local *0LDERR; 

my $ score _min; 
10 my $scorejnax; 

my $ score; 

my $wsum ; 

my $req,- 

my $res; 
15 my $ output; 

my ©result; 

my $seq; 

my $ content = ' * ; 

20 foreach $seq ( @seq) { 

$seq =- tr/acgtACGT//cd; 

if{ $content ) { 

$content .= B \nSseq'; 

) 

25 else { 

$content = R seguences=$seq H ; 

) 

) 

30 #$seq = $seq[0] ; 


$req = HTTP :: Request ->new ( POST => $glb->{ tm_url)); 
$req->content_type ( ■ application/x-www-f orm-url encoded' ) ; 
3 5 #$req->content (* tmuser=oli god&email=tol strup@exiqon.com ' > ; 
#$req->content ( n sequences=$seq" ) ; 
$req->content ( $content); 


40 


$res = $glb->{ ua} ->request ($req) ; 

~#print "$seq\h"; ' " 
# $output = $res->content; 

frprint " $output\n" ; 

4 5 check the outcome 

if ($res->is_success) { 

$output = $res->content ; 
} else { 

print "Unsuspected Tm result:" . $res->status line - °\n m 

50 > 


#my $output = 

# * $glb->{ lynx) -cookie_f ile=$glb->{ lynx_cookie) -source 
55 $glb->{ tm_exiqon_url)?sequences= ' $seq' * ,- 

# print '#### THE OUTPUT WAS An$output\n" ; 
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if( defined $output ) { 
while ( $output 
/<td>< [\d.]+)<\/td>\s+<td>< [\d. ]+><\/tdx\/tr>/gis> { 
$temp = $2; 

5 

$score_min = squash { $param->( oligod_param) { tm_min) { 
squash__f actor) * 

($temp - $param->{ oligodUparam) { tm_min) { cutoff)) > 
$score_max = squash ( $param->{ oligod_param) { tm_max) { 
10 squash_f actor) * 

l$param->{ oligod_jparam) { tnLJiiax) ( cutoff) - $ temp) ) 

$wsum = $score„min * $param->{ oligod_param){ tm_min) { 
weight) + 

15 $score_max * $param->{ oligod_param) { tm_max) C 
weight) ; 

$score = squash ( $param->{ oligod_param) { tm) { 
squash_ factor) * 

20 ( $wsum - $param->( oligoa\_param)( tm) ( cutoff))); 

push ©result, { "tm", ( "raw*, $wsum, "score", $score), 

"tm_min" , { "raw", $temp, "score", $score_min) , 
'titLmax", { 'raw', $temp, "score", $score_max) ) ; 

25 ) 
) 

else { 

print STDERR "Sorry, the tm prediction is not available."; 
foreach (Gseq) { 
30 push ©result, { "tm", { "raw", 0, "score", 0), 

n tm_min", { "raw", 0, "score", 0), 
M tm_max", { "raw", 0, "score", 0)); 

) 

35 ) 

\0result; 


40 


55 


) 


sub tnv_dan($$$> { 
my {$glb, $param, $seq> = 
my $temp = 9; 
my $comand; 

45 my $seqfn = "$glb->{ tmpdir )/${$ )dan. seq* ; 

my $resultfn = "$glb->{ tmpdir )/${$ )dan . dat " ; 
local *FILE; 
local *OLDERR; 
$seq tr/acgtACGT//cd; 
50 my $ windows ize = length ( Sseq) ; 
my $score_min; 
my $score_max; 
my $score; 
my $wsum; 


write_file( $seq, $seqfn) ; 

$comand = "$glb->{ dan) -sequence $seqfn -windowsize " . 
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*$windowsize -shift increment 1 * . 

■ -dnaconc $param->{ dnaconc) -saltconc $param->{ 
saltconc) ■ . 

"-outfile $resultfn"; 

5 

open (OLDERR, •>fitSTDERR" ) ; 
open ( STDERR , • ■ /dev/null ■ ) ; 

my $output = * $comand* ; 
close (STDERR) ; 
10 op e n i STDERR , ">&OLDERR"); 

open ( FILE, " <$resul tf n" > || die 'Could not open $resultfn"; 
while { <FILE>) { 
if( /Tm=U\d. }+) /) ( 
15 $temp = $1; 

) 

elsif ( /~\s+\d+\s+\d+\s+( [0-9. ]+)/) { 
$temp = $1; 

} 

20 } 

close FILE; 

unlink Sseqfn; 
unlink $resultf n; 

25 

$score_min = squash( $param->{ oligod_param} { tiu__dan__min) { 
squash_f actor ) * 

($temp - $param->{ oligodjjaram) { tnudan_min> { 

3 0 cutoff})); 

$score_max = squash ( $param->{ oligod_param> { tm_dan_max ) { 
squash_factor > * 

($param->{ ol igod_param}{ tm_dan__max ) < cutoff) - 

$temp) ) ; 

35 

$wsum = $score_min * $param->{ oligod_paraxn) ( tn\_dan_min) { 
weight ) + 

$score_max * $param->( oligod_oaram} { tm_dan_max) { weight}; 

40 $score = squash { $param->{ oligod_param) { tm_dan} { 
s~quash_£ actor} * 

( $wsum - $param->{ oligod_j>aram} { tm_dan) { cutoff))); 

({ "raw", $wsum, "score", $ score}, 

4 5 { "raw", $temp, "score", $score__min}, 

{ "raw", $temp, "score", $ scare_max} ) ; 

) 

###tt##fr###8# SELF HYBRIDISATION 

50 

sub default_mat ($) { 
my ( $mat_name) = @_; 
my $mat; 

55 if( $znat_name eq "lna" ) ( 
$mat = 

";D LNA hybridisation scoring matrix 
1 45 80 5 6 80 4 
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10 


15 


20 
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25 } 

elsif( $mac_name eq B dna" ) { 
$mat = 

°;D hybridisation scoring matrix 
1 45 80 5 6 80 4 
30 -8 -50 

* 9 0 1 2 
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50 


else { 

die °Ups $mat_name was not recogniced try lna or dna" ; 


55 


$nvat ; 
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sub self_macch($$$) { 
my ( $glb, $param, $seq) = 
local *PIPE; 

5 my $seqfn = "$glb->{ tmpdir} /$ { $) seq. f asta" ; 

my $corapseqfn= "$glb->{ tmpdir) /${$) compseq. f asta" ; 
my $f_seq; 
my $ score = 0 ; 
my $norm_score = 0? 
10 my $alignment= 
local *OLDERR; 
my $ evidence = ' ' ; 
my Sssparam = 1 ' ; 
my $str; 

15 

if( $param->( matrix) > { 
$ssparam = " -s $param->{ matrix} " ; 

) 

2 0 open ( OLDERR , " >&STDERR " ) ; 

open ( STDERR, *>/dev/null " ) || die "Can't open /dev/null 

$!"; 

$f_seq->{ seq} - $seq; 
25 $f_seq->{ name) = "seq"; 
if( $param->{ lna} ) < 

$f_seq->< seq) =- tr/ACGT/LIOU/ ,- 

) 

30 printer asta_fn( $f_seq, $seqfn) ; 
$f_seq->{ seq) = rev( $seq) ; 
$f_seq->{ name) = "rev"; 

if( $param->< lna) ) { 

3 5 $f_seq->{ seq) =~ tr/ACGT/LIOU/; 

) 

print_f asta_fn{ $f_seq, $coitipseq£n) ; 
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if( $glb->{ verbose) > 1 ) ( print " <SSEARCH> " ; } 
#print "$glb->"{ s search} $ssparam $seqfn $compseqfn\n" 


open( PIPE, "$glb->{ ssearch) $ssparam $seqfn Scompseqfn | ■ ) 
die "Failed to open pipe $!"; 
45 while ( <PIPE>){ 

if( /Smith-Waterman score : \s* { \d+ );/ ) { 
$score = $1; 

) 

if( $glb->< verbose) > 1 } { print; } 
50 if( /-[ .:\t)+$/) { 

tr/ . : / : / ; 

) 

if ( $param->{ lna) ) { 
if { /"<\S+) (.*)/) { 
55 $alignment .= $1; 

$str = $2; 

$Str *~ tr/ACGTLIOU/acgtACGT/ ; 

Salignment .= "$str\n w ; 


25 


> 

else ( 
Salignment . =s 

} 

5 > 

else { 
$alignment .= 

) 

> 

10 close PIPE; 

if( $glb->{ verbose) > 1 ) { print "</SSEARCH>\n" ; } 

if( $alignment =- / (Smith-Waterman score .*) Library sca/s) { 
$evidence = $1 ; 
15 if i Sglb->( verbose) > 1 ) { 

print ' < S ELF_EVT DENC E > \ n " ; 
print ■ $evidence ■ ; 
print B </SELF_JEVTDENCE>\n" ; 

) 

20 } 


close (STDERR) ; 

open ( STDERR ■ > &OLDERR " ) ? 

25 

unlink $seqfn; 
unlink $comnseqfri; 

$norm_score = squash{ $param->{ oligod_param}( self_match> { 
3 0 squash_f actor ) * 

( $paxam->{ oligod_param} { self_match) ( cutoff) - 

$ score) ) ; 

{ "score", $norm_score, "raw", $score ( "evidence", $evidence>; 


sub self_hyp($$@) { 
my ( $gl*>, $param, @seqs ) = 
40 local *PIPE; 

my $ score ="0; 

my $norm_ score = 0; 
my $evidence * * ' ; 

my $seqfn s -$glb->{ txnpdir} /${$ >seq_d. f asta" ; 
45 local +FILE; 
my $seq; 
my $ i = 0 ; 
my ©result = ( ) ; 

my $dyp_opt = ' -min_score 0 -max_hyp 1 ' ; 
50 my $read; 


open{ FILE, "> $seq£n") || die "Could not open Sseqfn*; 
foreach (@seqs) { 
55 $seq->{ seq) = 

$seq->{ name) = sprintf( ■seq_%03d", $i) ; 

$i++; 

print__f asta ( $seq, \*FILE) ? 
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) 

close FILE; 

if{ $glb->{ verbose} > 1 ) { print ,, <DYP> P ; > 

5 

#print "$glb->{ dyp) -f $seqfn $dyp_opt\n" ; 

open< PIPE, "$glb->{ dyp) -f $seqfn $dyp_opt|") | | die "Failed 
to open pipe $ ! ■ ; 
10 $read = 1; 

while ( <PIPE>){ 
if( $glb->{ verbose) > 1 ) ( print; ) 
if( />C\S*)/) { 
) 

15 elsif( /Score=\s* (\d+) /> ( 

if ( $evidence ) ( 
$norm_score = squash ( $param->{ oligod_param}{ self_hyp}( 
squash_f actor ) * 

( $param->{ oligod_param} { self_hyp}{ cutoff) - 

20 $score) ) ; 

my $evidence_str = ■ $evidence " ; 
push ©result, 

{ "score* , $nonn_score, "raw", $score ( "evidence", 
$evidence_str) ; 
25 $evidence = 

> 

Sscore = $1; 
$read = 1 ; 

) 

30 elsif( /Mask:/ ) { 

$read = 0; 

) 

elsif( Sread) { 
$ evidence . = $__; 

35 ) 

> 

close PIPE; 

$nomuscore = squash( $param->{ oligod^param) ( self_hyp) { 
40 sguash_f actor} * 

( $param->{ 'ol'igod^param) { "self__hyp) {* cutoff) - 

$SCOre) ) ; 

push ©result, ( "score", $norm_score, "raw", $ score, 
" evidence ■ , $ evidence ) ; 

45 

if( $glb->{ verbose} > 1 ) { print "</DYP>\n rt ; } 

if( $glb->( verbose) > 1 ) { 
print "<SELF_EVIDENCE>\n B ; 
50 print "$ evidence " ; 

print "</SELF_EVIDENCE>\n" ; 

> 

^result; 

55 ) 


sub targe t_struct ($$$){ 
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my ( $glb, $param, $seq> = 
local *PIPE; 
my $ score = 0; 
my $ evidence = 1 1 ; 
5 my $seqfn = "$glb->{ tmpdir) /$ ($}seq_t . f asta" ; 
my ©result = ( ) ; 
my $xnin_score= 15; 

my $dyp__opt = "-depth 100 -rnin_score $min_score°; 
my $read_mask = 0; 
10 my $mask; 


print__fasta_fn( $seq, $seqfn) ; 

15 #print -$glb->{ dyp) -£ $seqfn $dyp_opt"; 

open{ PIPE, "$glb->( dyp) -f $seq£n $dyp_opt | " ) || die "Failed 
to open pipe $ I M ; 
while ( <PIPE>) { 
20 print; 

if( $glb->{ verbose) > 1 ) { print; ) 
if( />(\S*>/> { 
Smask = ' ' ; 

} 

25 elsif( /Score=\s* (\d+) /) { 

$Score = $1; 

) 

elsif( /Mask:/}{ 
$read_mask = 1; 

30 ) 

elsif( $reacL.mask) { 
$mask . = $_ ; 

) 

else { 

3 5 $evidence .= $__; 

) 

> 

close PIPE; 

40 print " <TARGET_STRUCT>\n" ; 
print $evidence; 
if( $mask ) ( 
print $mask; 

) 

45 else { 

print "No significant secondary structure identified."; 

) 

print ■ < /TARGET_STRUCT> \n " ; 

50 $evidence =- s/\n//g; 

$mask =- s/\n//g; 

if( $score < $min_score) { 
$ ma.sk = " " ; 

55 > 

$mask; 

) 
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#&###### PATTERN FILTERS 

5 

sub targe t_struc t_score ( $$$$) { 
my ($mask, $start, $end, $param) = @_; 
my $ score = 1 ; 
my $count = 0 ; 

10 

if( $mask ) ( 

my $len = $end - $start; 
15 $_ = substrt $mask, $start f $len) ; 

$count = tr/-/#/; 

my $match = int< $count / $len * 100); 

$score = squashC $param->{ oligod_param){ targe t_st rue t) { 
squash_f actor) * 

20 ( $param->{ oligod_param) { target_struct } { cutoff) - 

$match) ) ; 
) 

{ 'score*, $score, "raw", $count, "evidence", $_ ) ; 

25 ) 

sub cg_score ( $ ) < 
my ($sea) = 
if i $sea /cg/i) { 
30 return { "score", 0, "raw", 1); 

) 

{ "score", 1, "raw", 0}; 

> 

3 5 sub ggg_score C $ $ H 

my <$param, $seq) = @_; 
my Smaxg = 0 ; 
my $ng; 
my $score; 

40 

• while ( $seq /(g+)/gi) ( ■ 
$ng = length ( $1) ; 
if( $ng > $maxg) ( $maxg = $ng; ) 

) 

45 

$score = squash! $param->( oligod^param) ( ggg> { squash_f actor > 

( $param->( oligod_param) { ggg) { cutoff) - $maxg) ) 
{ "score", $ score, "raw", $maxg) ; 

50 ) 


############ PROBE ############ 

55 

sub oligo_property_score ($$) { 
my ($param, $oligo_property) = G_; 
my $oligo_par; 
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my $name; 
my 0 input; 
my ©weight; 
my $ output; 

5 

foreach $name t keys %($pararo->{ oligocLparam) ) > ( 
$oligo_par = $param->( oligod__param) ( $name); 
if( $oligo_par~>( weight) && $oligo_par-> ( layer) ) { 
#print "name = $name\n n ; 
10 #print "score = $oligo_property-> { $name) { score)\n"; 

#print "weight = $oligo_par->{ weight) \n n ; 
push ©input, $oligo_p roper ty->{ $name}{ score) ; 
. push ©weight, $oligo_par-> { weight); 
) 

15 } 

$output = perceptron( \@weight, \@input); 
$OUtpUt; 

) 

20 

sub print_oligo_property ($$) ( 
my (Spa ram, $oligo_property) = @_; 
my $name ; 
my $oligo __par; 

25 

print "<SCORE_EVIDENCE>\n" ; 

printf< "%-lls %5.2f\n", "oligo score : ■ , $oligo_property-> { 
score) ) ; 

foreach $name ( sort keys %{$param->{ oligod_jparam) ) ) ( 
30 $oligo_par = $param->( oligod__param) { $name) ; 

if( $oligo_par->{ weight)} { 
printf{ " %-13s %7g score=%5.2f x %3g (%6g %5g %4g)\n", 
$name, 

$oligo_property-> { $name) ( raw), 
35 $oligo_property->{ $name) { score), 

$oligojar->( weight), 
$oligo__par->{ cutoff), 
$oligo_par->{ squash_dx) , 
$oligo_par->{ squash_dy) } ; 

40 ) 

print n </SCORE_EVIDENCE>\n\n" ; 

) 

45 

sub coliect_score {$$$$$$$) ( 
my ($prev_region, 

$start_count, $end_count, $start_score, $end^score, 
$prev_pos, 
50 $region) = G_; 

my $next_region; 
my $extra_region; 

if{ $start_count && $end_count ) { 
55 if( $prev_region ) { 

$prev_region->( end) = $prev_pos - 1; 

$next_region->{ hit_count)= $prev_region-> ( hit_count); 
$next_region->{ score) = $prev_region-> ( score); 


30 


$extra_region->{ hit_count)= $prev_region->{ hit_count); 
$extra_region->{ score} = $prev_region->{ score); 

} 

$next_region->{ stare) = $prev_j?os + 1; 
5 $extra_region~>{ start) = $prev__pos; 

$extra_region-> ( end) = $prev_pos; 
$extra_region->{ length) = 1- 
$extra_region->< hit_count)+= $start_count; 
$ ex t r a_ region- > ( score) + = $ s tar t_s core; 


15 


20 


25 


30 


35 


40 


$next__region->{ start) = $prev_pos + 1; 
$next_region->{ hit_count) += $start_count ; 
$next_region->{ score) += $start_score; 
$next_region->{ hit_count) -= $end_count; 
$next - region->{ score) -= $end_score; 

) 

elsif( $start_count ) { 
if{ $prev_region ) { 
$prev_region->{ end) - $prev_pos - 1; 

$next_region->{ hit_count}= $prev_region->{ hit_count); 
$next__region->{ score} = $prev_region-> ( score) ; 

) 

$next_region->{ start) = $prev_pos; 
$next_region->{ hit_count) += $start_count ; 
$next_region-> { score) += $start_score; 

) 

elsif( $end_count) { 
if{ $prev__region > { 
$prev_region->{ end) = $prev_pos; 

$next_region->{ hit_count}= $prev_region-> { hit_count); 
$next_region~> { score) = $prev_region-> { score); 

) 

$next_region->{ start) = $prev_pos + 1; 
$next_region->{ hit_count) -= $end_count; 
$next_region-> { score) -= $end_score ( - 

) 

else { 

$next_region->{ start) = $prev_pos; 
$next_region->( hit_count) = 0; 

$next_region->{ score) = 0; 

) 


if ( $prev_region) { 
45 $prev_region->{ length) = $prev_region->{ end) - 

$prev_region-> { start )+l; 

if( $prev_region->{ length) >0 ) { 
push @$region, $prev_region; 

) 

50 } 

if( $extra_region ) { 
push 9$region, $extra_region; 

) 

($prev_region, $next_region) ; 

55 ) 


sub print_hit {$$) { 
my <$hit. $newline) = ©. 
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my $nl » " n ; 

if( $newline) { $nl = "\n";) 

printf< "%4d - %4d %3d%% %4d %10s %6d - %6d %s $nl", 
5 $hit->{ qstart), $hit->{ qend), 

$hit->{ ident}, $hit->{ score). 

$hit->( tname}, $hit->{ tstart), $hit->< tend), 
$hit-> C tstrand) > ; 

) 

10 

######## IDENTIFY THE SEARCH SEQUENCE IN THE DATABASE 


sub check_self_hit ($$) ( 
15 my ($qlen, $hit) = G_; 
my $glb_ident = 0; 
if( $qlen ) { 

$glb_ident = U$hit->{ qend) - $nit->{ qstart)) * $hi_t->{ 
ident))/ $qlen ; 
20 ) 

if { $glb_ident > 95) < 
return $glb_ident; 

> 

0; 

25 > 


sub overlap ( $ $ ) { 
my ($hitl. $hit2) = @_; 
30 my $overlap = 1; 

my $minslack = 5; 
my $maxslack = 30; 

my $len = min( $hit2-> {qend) -$hit2 -> ( qstart), 
$hitl->{qend)-$hitl->( qstart)); 
35 my $slack = tnax[ $minslack, min( $maxslack, $len * 0.1)); 
if ( { $hitl->{ qstart} + $slack) > $hit2->{ qend) | | 
( $hit2->{ qstart} + $slack) > $hitl-> (qend) ) { 
$ overlap = 0; 

) 

40 $ over lap; 


sub check_coverage ($0) { 
45 my ($qlen, @exon) = ©_; 
my Shit; 

my $prev_ end = 0; 
my $ overlap; 
my Smaxgap = 30; 
50 my Smaxoverlap = 0; 

print * <GENE>\n n ; 

print "Coverage of the query ( $qlen> : \n n ; 

foreach $hit ( sort {$a->{ qstart) <=> $b->{ qstart}) Qexon) { 
55 $overlap = $hit->( qstart) - $prev_end; 

if{ abs< SoverlapJ > $maxoverlap) ( Smaxoverlap = abs ( 
$ overlap) ; } 

if ( $prev_end) { 
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print " $overlap\n* ; 

} 

print_hit( $hit, 0); 
$prev_end = $hit->{ qend) ; 

5 > 

if ( $prev_end> ( 
$ over lap = $qlen - $prev„end; 

if ( abs ( $overlap) > $maxoverlap) { $raaxoverlap = abs { 
$ over lap) ; ) 
10 print '^overlapNn"; 

) 

if( $maxoverlap > $maxgap ) { 
print 'Warning: The query sequence was not found completely in 

n 

15 -the database\n"; 

) 

} 


20 sub fincUself ($%) { 

my ( $param, %blast_parse) = 

my $hit; 

my ©exon; 

my $overlap_f lag; 
25 my $ over lap_h.it ; 

foreach $hit ( sort ($b->{score) <=> $a->(score}} 
@{$blast_parse{ hits)} J { 

if( $hit->{ qlen) >= $param->{ gene_len_cuto£f ) && 
30 $hic->( ident) >= $param->{ gene_ident_cutof f ) ) { 

$hit->( self} = 1; 
$overlap_f lag = 0; 
foreach ( ©exon) { 
if( overlap ( $hit) ) { 

3 5 $overlap_f lag = 1; 

my %new_var = %$_; 
$overlap_hit = \%new_var; 
last; 

) 

40 ) 

if( ! $overlap_^f lag -) { — 

push @exon , $hit; 

> 

else { 

45 #print "Overlap to:\n M ; 

#print_hit( $overlap_hit , 1); 

) 

} 

) 

50 check_coverage ( $blast_parsef qlen}, @exon) ; 


5 5 sub print _exon ( 9 ) { 
my (@exon) = @_; 
my $hit; 
my $i = 0; 
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for each $hit ( sort { $a->{ qstart} <=> $b->{ qstart}) Sexon) { 
$i + +; 

print "$hit->< qstart) - $hit->{ qend)\n B ; 

> 

5 ) 


######*##################** »#######*#### *m####ft##*#ft ft # 

10 

sub blast_param2param_line ($ ) ( 
my ( $blast_param) = 

my %key2opt = ( "wordlen" , "-W", "strand-. m -S m , "expect", "- 
e", 

15 "filter", '-F", "nproc", "-a"); 

my %strand2opt =»(■-", 3, "both strands", 3, 

"direct strand only", 1, "reverse strand only", 2, 

"top", 1, "bottom", 2); 
my $param_line = ' ' ; 

20 

foreach (keys %$blast_param) { 
ifC $key2opt{ $_) ) { 

if ( $_ eg "strand" ) { 

$param_line .= " $key2opt{ $__} $strand2opt< 
2 5 $blastj?aram->{ $_} } ■ ; 
) 

else { 

$param__line .= " $key2opt{ $_J $blast_param->{ $_} ■ ; 

> 

30 } 
} 

$param_line,- 

) 

35 

sub param line2blast param [ $ ) { 
my ( $param_line) = @_; 

my %key2opt = { "wordlen", "-W", "strand", "-S", "expect", 

40 "filter", n -F°, "nproc", "-a"}; 

my %opt2strand = { 3, "both strands", 1, "direct strand only". 

2, "reverse strand only"); 
my $blast_param; 
45 my %opt2key = reverse %key2opt; 

my %param_hash = split ( " ". $param_jLine) ; 

foreach (keys %param u _hash) { 
if( $opt2key{ $_} ) { 
50 if( $_ eq "-S" ) { 

$blast_param->{ $opt2key{ $_} ) = $opt2strand{ 
$param_hash{ $_} ) ; 
} 

else ( 

55 $blast_param->{ $opt2key{ $_) } = $param_hash{ $_) ; 

} 

} 

else { 
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die • Sorry , the option $_ has not been implemented yet."; 

) 

} 

$blast_param; 

5 } 


sub position_score ($$$) C 
my ( $glb, $param, $seq) = @_; 
10 my $hit; 

my $hit_id = 0; 
my $blast_out; 
my %blast_parse; 

# -e 100 : show all hits 50 = down to 17 nuc in human 
15 # -F F : Don't filter 

# -S 1 : Search only top strand of query, default is 
3=both strands 

# -a 4 : Use four CPU's 

# -W 30 : Word lengt 30 (default is 11) 

20 

my @end_points; 

my @ region; 

my $prev_pos = 1 ; 

my $p; 
25 my $ score; 

my $hit_count; 

my $prev_region; 

my $next_region; 

my $one_nuc_region; 
30 my $ s tar t_s core; 

my $end_score; 

my $start_count; 

my $enoL,count; 

my $ ident; 
35 my $ignore = 0; 

my $param_line - blast_param2param_line ( $param->{ 
blast_param) ) ; 

$blast_out = blast < $glb, $seq, $param~>{ blastdb) , "blastn", 
40 $parairv_line) ,- 

if( $glb->{ view) =~ /BLAST/ ) ( 
print n <BLAST>$blast„OUt</BLAST>\n B ; 

> 

%blast_parse= parse_Jt>last { $blast_out) ; 
45 find_self( $param, %blast_parse) ; 

# push @{$blast_parse{ hits}), { 
"query_name" , "X" , " tname" , "X" , "score" , 10, 

# 

"qstrand*, "P", "tstrand", "P", "qstart D , 252 , "qend" , 269 , "ident", 

50 0); 

foreach $hit (G( $blast_parse ( hits))} { 
#if £ $ident = check_self_hit ( $blast_parse { qlen) , $hit) ) ( 
if( $hit->{ self}) { 
if ( I $ignore ) { 
55 Signore = 1; 

print "Ignoring the following hit(s):\n"; 

) 

$ident = sprintf( "%3.0f, $hit->{ ident}); 
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printf{ "%30s %6d %6d %s\n" , $hit->{ tname), $hit->{ 
tstart), $hit->( tend), 

"$hit->{ ident)% identity to $blast_parse{ query_name} " ) ; 

> 

5 else{ 

push 0end_points, {"hit_id", $hit_id, "pos", $hit-> {qstart} , 
•type" , "start") ; 

push ©endLpoints, {"hit^d*', $hit_id, "pos", $hit->(qend) , 
"type" , "end") ; 

10 

if (0) { 

print ■ $blast_parse{ query_name) , $hit->( tname) 
"$hit->{score) " . 
•$hit->(expect) • . 
15 M $hit->{qstrand) ". 

"$hit->{tstrand) ■ . 
"$hit->{qstart) ■ . 
■$hit->{qend) " . 
"$hit->(ident)Sn" ; 

20 ) 

$hit_id++; 

) 

> 

25 if( ! Signore > { 

print "The query sequence was not found in the database! \n"; 

> 

print "</GENE>\n" ; 


30 $start_count = 0; 

$end,count = 0; 
$start_score = 0; 

$end__score = 0 ; 


35 if( $#end_points >= 0 ) { 

foreach (sort { $a->{pos) <=> $b->{pos)) @end_points) { 
# print "$_->( pos)\n"; 

40 if( $_->( pos) > $prev_pos) { 

$prev_region = collect_score ( $prev_region, 

$start_count, $end_count, $start_score, $end_score, 
$prev_pos , 

\ ©region) ; 
45 $prev_pos = $_-> { pos}; 

$start_count = 0? 

$end_count = 0 ; 

$s tar t_s core = 0; 

$end__score = 0; 

50 ) 

if< $_->{ type) eq "start" ) ( 
$start_score += $blast_parse{ hits)->[ $__->{ hit_id)]->{ 
score) ; 
55 $start_count ++; 

) 

else { 
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$end__score += $blast_parse{ hits}->[ $_->{ hit„id}]->{ 
score) ; 

$end_count ++ ; 

} 

5 } 

($prev_region, $p> = collect_score { $prev_region, 
$start_count, $end_count, $ s tar t__s core, $end_score, 
$prev_jpos . 
10 \ ©region) ; 

my $last_region; 

$last_region->{ start) = $prev_region-> { end) + 1; 

$last_region->{ end) = length( $seq->{ seq) ) ; 

$last_region->{ length) = 
15 $last_region->{ end} - $last_region->{ start) + 

1; 

$last__region->{ score) = 0; 
$last__region->{ hit_count) = 0; 
if ( $last_region->( length) > 0 ) { 
2 0 push ^region, $last_region; 

) 

) 


25 if(0)< 

foreach (©region) { 
print 

p $_->{ start), $_->< end), $_->< score). $_->{ hit_count), 
$_->{ length) \n"; 
30 ) 
) 


<\©region, \%blast_parse) ; 

35 ) 

sub hit_score ($$$$) { 

my ($position_scare, $start, $end, Sparam) = 
my $ sums core = 0; 
40 my $n = 0; 

my— $ score ; ■ -■■ - - - 

my $ over lap; 

my Smaxscore = -999 9999999; 
my $norm_score; 
45 my $ result; 

foreach (@$position_score) { 
if < ($_->( start) <= $end $_->{ end) >= $start> ) { 
$overlap = min( $end # end)) - max( $start, $_->{ 

50 start)); 

$score o $_->{ score) * Soverlap; # / $_->( length); 
if( $score > Smaxscore ) { 
$maxscore = $ score; 

) 

55 if( $score > $param->{ min_score) ) { 

$sumscore += $ score; 
#print -($score $_->{ start} end) $_->{ score) $_->{ 

length) $overlap) 9 ; 
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10 


) 

$n ++; 

} 

> 

$norm_score = squash< $param->{ oligod_param) { hit_score) { 
squash_Eactor) * 

( $param->{ oligod_param) { hit_score}{ cutoff) - 
$sumscore) ) ; 

#print "SCORE: $sumscore, $param->{ min_score) , $nonn_score\n 1 ' ; 

#$score = int ( $sumscore + 0.5); # 0.5 to get correct rounding 
$result = { "score", $norm_score, "raw" , $sumscore) ; 


$result; 

15 ) 

sub oligo_sort_dif f ( $$) { 
my ($a,$b) = 
my $result; 

20 $result « $a~>{dbmatch) <=> $b->( dbmatch) ; 

if( 1 $result) { $result = $a->{temp) <-> $b->{ temp)? ) 
flif( ! $result) { $result = $a->(self_score) <=> $b->{ 
self_score); ) 
$result; 

25 > 

sub f ind_oligo_hits {$$$) { 

my ($blast_parse, $start, $end) = @_; 

my $hit; 
30 my $overlap; 

my $qof£set; 

my ©hits = ( ) ; 

my $qseq; 

my $mseq; 
35 my $tseq; 

my $1 = 1 ' ; 

my $r - ' ■ ; 

foreach $hit (@{ $blast_parse->{ hits))) { 
40 $overlap - min( $end, $hit->{ qend) ) - max( $start, $hit->{ 

— " qstart}) ; - ' - " 

if( !$hit->{ self) && $overlap > 0 ) { 
Sqoffset = max{ 0, $start - $hit->( qstart)) + 1; 
$1 = ■ ' x max( ($hit->{ qstart) - $start) , 0) ; 

4 5 $r = 1 ' x max( (Send - $hit->( qend)), 0) ; 

#$qseq = $1 . substrt $hit->{ queryseq) , $qoffset, $overlap) 
• $r; 

$niseq = $1 . substr( $hit->{ matchseq) . $qof£set, $overlap) 
. $r; 

50 $tseq = $1 . substr( $hit->{ targetseq} , $qoffset, $overlap) 

. $r; 

push @hits, ( "overlap", Soverlap, 

"qseq", $qseq, "mseq", $mseq, "tseq", $tseq, 
"tname", $hit->{ tname) ); 

55 ) 
) 

Shits; 
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sub ma tch_s core ($$$$) { 
my ($param f $blast_parse, $seq__start, $seq_end) = 
my $myhit; 
5 my $nm; 

my $max_match = 0; 
my $max_stretch = 0; 
my $1; 

my $max_ stretch^score; 
10 my $max _jnatch_score ; 

my ©hits = f ind_oligo_hits ( Sblast^parse, $seq_start, 
$seq_end) ; 

15 foreach $myhit (sort ($b->{ overlap} <=> $a->( overlap)) 

©hits) { 

$_ = $myhit->{ mseq) ; 
while( /<\|+>/g ) { 
$1 = length ($1) ; 
20 if ( $1 > $max_stretch) { 

$max_stretch = $1; 

) 

) 

$nm = s/\ | / . /g; 
25 if( $nm > $max_match) { $max_match = $nm; ) 

) 

$max„match_score = squash ( 

$param->( oligod._param) { max_match) { squash_f actor ) * 
( $param->{ oligod_param) { max_match} { cutoff) - 
30 $max_jmatch) ) ; 

$max_stretch__score = squash. ( 

$param->{ oligooUparam) { max_s tretch) { squash_f actor ) * 
( $param->{ oligod_param) { max^stretch} { cutoff) - 
$max_st retch) ) ; 
35 #print "MATCH $raax_stretch, $max_match\n ,, ; 

({ "score", $max_match_score, "raw". $max_match}, 
{ "score", $max_stretch_score. "raw", $max_stretch) } ; 

> 

40 

sub find_oligo ($$$$$$) { 

my ($glb, $seq, $position_score, $blast_parse, $param, 
$seq_number) = 
45 my $i; 

my $1 = length < $seq-> (seq) ) - $param->{ oligo_length} ; 
my $s; 

my $temp = 0; 

my $temp_tm = 0; 

50 my $tm_dan = { "temp", 0); 

my $tm; 

my $self_score - 0; 

my $db_score; 
my ©result; 
55 my $min_hit_score_score = 999999999; 
my ©oligo_center = (); 
my $c = 0; 

my $dist; 
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my $this_oligo — over laps = 0; 
my Ghits; 

my $hitcount = 0; 

my $nm; 
5 my $myhit; 

my $lnaseq; 

my Sevidence; 

my $ count ; 

my $max u _match; 
10 my $seq_start; 

my $seq__end; 

my $oligo_property; 

my Qseqs; 

my Qlnaseqs; 
15 my Gdummy; 

my Oself^hyp; 

my $self_hyp_batch = 1; 

my $tm__batch = 1; 

my $tm^result; 
2 0 my $target_mask; 

if( $param->{ oligod^paramH targe t_s t ruct } { weight) ) { 
$target_mask = targe t_st rue t ( $glb. $param, $seq) ; 

25 ) 

for< $seq_start=l; $seq_start<$l ; $seq_start++) { 

$seq end = $seq_start + $param->{ oligo_length} j 

$s = substr( $seq->( seq) . $seq_start, 

30 $param->{ oligo_length} ) ; 

$i = $seq_start - 1; 

$seqs[ $i] = $s ; 

$lnaseqs[ $i) = capitalize! $s, $param->{cphase) , 

$param->{ cf req) , $param->{ end_spike_len) ) \ 

35 } 

iff $self_hyp_batch) { 
if( $param->< oligod_param) { self_hyp}{ weight) ) { 
if< $param->{ lna>) ( 
40 ©self_hyp = self_hyp< $glb. $param, @lnaseqs); 

else { 

@self_hyp - self_hyp( $glb, $param, @seqs> ; 

> 

45 > 
) 

if( $tm_batch) { 

if( $param->( oligodL_param}{ tm_mixi> { weight) && 
50 $param->{ oligodLparam} { tm_max} { weight) ) { 

$tm = tm( $glb r $param, ©lnaseqs) ; 

} 

) 

55 fori $seq_start=l ; $seq_start<$l ; $seq_start++) { 
$i = $seq_start - 1; 

$lnaseq - $lnaseqs[ $i) ; 

$seq_end = $seq_start + $param->{ oligo_length) ; 
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20 


$oligo_property = { "seq", $seqs ( $i] , "start* , $seq_start y 

"end*, $seo_end) ; 

$s = $seqs [ $i] ; 

if( $param->{ oligocL_param) { hit_score){ weight) ) { 
5 $oligo_property->{ hit_score) = 

hit_score( $position_score, $seq_start, $seqL_end, 

$param) ; 
) 

10 if( $param->{ oligod_param} { target_struct} { weight) ) { 

$oligo_property->{ target_struct) = 

target_struct_score ( $target_mask, $seq_start, $seq_end, 
$param) ; 
) 

15 

if( $param->{ oligod_j?aram) { max_ match) { weight} || 

$param->( oligod_param) ( max_stretch> { weight) ) { 
($oligo_jproperty->{ maxima tch) , $o ligo_p roper ty->{ 
max„st retch) ) 

match_score{ $param, $blastj>arse, $seq_start, $seq_end) ; 

) 

if( $param->{ olige>d_param)( sel£_match}{ weight) ) ( 
25 if( $param->{ lna) ) ( 

$oligo_property->{ self_match}= self_match< $glb, $param, 
Slnaseq) ; 
} 

else { 

30 $oligo__p roper ty->( self_match)= self_match( $glb, $param, 

$s) ; 

} 

) 

35 if( $param->{ oligod_param}{ self_hyp){ weight) ) { 

if( $self_hyp_batch ) { 
$i = $seq_start - 1; 

$oligo_property->{ self_hyp) = $self_hyp [ $i ] ; 

) 

40 else ( 

if( $param->{ Ina}) { • - • 

($oligo_property->{ self_hyp) , ©dummy) = self_hyp( 
$glb # $param, ($lnaseq) ) ; 
) 

45 else { 

<$oligo_property->{ self_hyp), @dummy) = sel£_hyp( 
$glb, $param, ($s) ) ; 
) 

) 

50 } 


if( $param->{ oligod__param} { tm_dan_min) { weight) && 
$param->{ oligod_jparam} { tm_dan_max> ( weight} ) { 
5 5 < $oligo_property->< tm_dan) , 

$oligo_property->{ tm_dan_min) , $oligo_property-> { 
tn\_dan__max } ) = 

tm_dan( $glb, Sparam, $s) ; 
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) 


10 


15 


iff $param->{ oligod__param) { tm_min) ( weight) && 
$param->{ oligod_param) { cm_max} { weight) ) { 
if( $tm_batch ) { 
$i = $seq_start - 1; 
$tm_result = $tm->[$i]; 

$oligo_property->{ tm) = $tm_result-> { tm) ; 

$oligo_property~>{ tm_jnin} = $tm_result-> { tm_min) ; 
$oligo_property~>{ tm__max) - $tm_result-> { tm_jnax) ; 

) 


else { 
$tm_result 

$oligo_property-> { tm) 
$oligo_property->{ tm__min) 


= tm( $glb, $param, ($lnaseq)); 
$tro_result-> [0] { tin) ; 
$tm_result-> [0) { tin_min); 


) 


$oligo_property->{ tnumax} = $tm_result-> [0 ] { tm_max) ; 


20 


25 


30 


if{ $param->{ oligod_param) { ggg> { weight) ) { 

^ $oligo_property->< ggg) = ggg_score( $param, $s>; 

if( $param->{ oligod_pararo)( eg) { weight) ) < 
$oligo_property->{ eg) = cg_score ( $s) ; 

) 

$oligo__property-> { score) = 

oligo_property_score( $param, $oligo__property) 


) 


push ©result, $oligo_property; 
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40 


45 


50 


55 


$i = 0; 

foreach $oligo_property { sort {$b->{ score) <=> $a->{ score) 
©result) { 

$this_oligo_overlaps = 0; 
i£( $param->{ mindist) ) ( 
$c = $oligo_property~>{ start) + 

($oligo^)roperty->{ end) - $oligo_property->< start } > / 2 
foreach <@oligo_center) { 
$dist = abs( $_ - $c> ; 
if( $dist < $param->{ mindist) ) { 
$this_oligo_overlaps = 1; 
last; 

) 

) 

) t 

ftprint "Oligo center is $c, overlap is $this_oligo_overlaps\n" 
if( ! $this_oligo_overlaps) { 
$count++; 

if( $glb->( view) =- /PROBE/ ) { 
print n <PROBE>\n" ,- 

print "seqno: $se<L_number oligono: $count\n"; 
#$temp = melt_temp( $oligo_property-> { sea;}); 
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$lnaseq = capitalize ( $oligo_property->{ seq) , $param- 

>{cphase} , 

$param->{ cfreq), $param->{ 

end__spike_len) ) ; 
5 #$temp_tm = tm_pred( $lnaseq) 

#$tm_dan = tm_dan( $glb, $param, $oligo_property-> { seq) ) ; 
#$self_score = self_match( $glb, $param, $ oligo_p roper ty- 
> { seq) ) ; 

©hits = find_oligo_hits( $blast_parse, $oligo_property-> ( 
10 start), 

$oligo_property->( end)); 
$max_match = 0; 

foreach $myhit (sort ($b->{ overlap) <=> $a->{ overlap)) 
15 ©hits) { 

$_ = $myhit->{ mseq) ; 
$nm - s/\ | / . /g; 

if ( $nm > $max__match) { $max__match = $nm; ) 

) 

20 

if( $glb->{ view) =- /SCORE_EVlDENCE/ J { 

print_oligo_property ( $param f $oligo_jproperty> ; 

) 

25 #if{ $glb->{ view) /PROBE/ ) { 

ftXXX printf ( B score=%4.2f tnv_ex=$temp_tm n , $oligo_property- 
>{ score) ) ; 

printf( "score=%4 .2f 0 , $oligo_property-> { score}); 
if( $param->{ oligod_param) { tm_min) { weight) && 
30 $param->{ oligod_param) { tm^max) { weight)) ( 

#print " # tm= $oligo_p roper ty->{ tmjainH raw)\n\n"; 
print tm=$oligo_property->< timiin) { raw) M ; 

) 

else { 
35 print "\n"; 

) 

if{ $param->( oligocLparam} { tm__dan_min) { weight) && 
$param->{ oligocLparam) ( tm_dan_max) { weight)) { 
#print °, tm_dan=$oligo__property->{ tm__dan_min) { raw)\n\n" 
40 print * t tm__dan= $oligo_p roper ty->{ tm_dan_jnin) { raw)\n\n"; 

else { 
print *\n\n n ; 

) 

45 print "$lnaseq $oligo _property->{ start) $oligo_property-> { 

end}\n"; 
#) 

if( $glb->{ view) =~ /HITS/ ) { 
50 print -<HlTS>\n"; 

# - $oligo_property->{ dbmatch) , ■ . 

# "$window->( self_ score) , $max_pnatch\n" ; 
Shitcount = 0; 

foreach $myhit (sort ($b->{ overlap) <=> $a->( overlap)} 
55 @hits) { 

if( $hitcount >- $param->( maxhits)) { 
my $n = $#hits + 1; 

print 'Only showing $param->{ maxhits) hits of $n hits\n" 
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last; 

> 

#print °\n$myhit->{ qseq) \n$myhit-> { mseq) \n$myhit-> { 
tseqUn" ; 
5 = $myhit~>{ mseq); 

$nm = s/\ | / ./g; 

print£( ■ $myhit->{ nxseq} %2d matches\n$myhit->{ tseq) 
$myhit->{ tname) \n" , 

Snm) ; 

10 $hitcount++; 
} 

print "</HITS>\n B ; 

) 

if{ $glb->{ view) =- /SELF_EVIDENCE/ ) { 
15 print "<SELF_EVIDENCE>\n" ; 

if( $param->{ oligod_param} ( self_hyp}{ weight} ) { 
print " $oligo_property-> { self_hyp){ evidence}"; 

) 

if( $param->{ oligod_param) { self_match}{ weight) ) { 
20 print ■ $oligo__property-> { self_match} { evidence}"; 

) 

print "</SELF_EVTDENCE>\n" ; 

} 

if I $glb->{ view) =- /TARGET_STRUCT_HIT/ ) { 

2 5 print M <TARGET_STRUCT_HlT>\n M ; 

if( $param->{ oligod^param) ( target_struct } { weight} ) { 
print " $oligo__property->( target_struct ) { evidence) "; 

} 

print ■ \n</TARGET_STRUCT_HIT>\n" ; 

30 } 

if( $glb->{ view) /PROBE/ ) { 
print "</PROBE>\n"; 

) 

push @oligo_center, $c; 

3 5 $i++; 

) 

i£( $i > $param->( max_noligo) ) < last; ) 
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} 


} 


########## MAIN 

45 sub main($> { 

my <$glb) = @_; 

my Slastline ; 

my $seq; 

my $count = 0; 

50 my $position_score; 

my $blast_parse; 

my $hi t ; 

my $check_blast = 1; 

my $param; 

5 5 my $s tart time; 

my $date ; 

my $ runtime; 

my $opt; 
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my $popt; 

local •INFI1.E; 

my $oligo_design = 0; 

5 

if (0) { 

$seq = read_file( "out .blasf ) ; 
#print $seq; 
parse_blast< $seq) ; 
10 exit{ 0); 
> 

($opt, $popt) = parse_argv< $glb) ; 
15 read_conf( $glb) ; 


$param = initialize_param{ $popt) ; 

20 

makedir( $glb->( tmpdir)); 


25 

for each (@ARGV> { 
if C $_ eq --■> { 
* XNFILE = *STDIN; 

) 

30 else { 

open! XNFILE , "<$_") || die "Could not open $_ $! M ; 

} 

while ( $seq = read_£asta( \$lastline, \*INFILE) ) { 
35 $count++; 

if{ $opt->( capitalize}) { 
$seq->{ name} = $aeq->( name} . M _lna"; 
$seq->{ seq) = capitalize ( $seq->{ seq), 

$param->{ cphase} , $param->{ cf req} , 
40 $param->( end_spike_len> ) ; 

print_fasta( $seq, \*STD0UT) ; 

) 

elsif( $opt->( rev}) { 
$seq->{ seq} = rev( $seq->{ seq} ) ; 
45 $seq->{ name} = $seq->{ name} . "_rev" ; 

$seq->{ seq} = capitalize ( $seq->{ seq}, 2, 3, 0) ; 

print__£asta< $seq, \*STDOUT) ; 

50 } 

elsif( $opt->{ comp} ) { 
$seq->{ seq} = comp ( $seq->{ seq}); 
$seq->{ name} = $seq->{ name) . "_comp"; 
print_fasta< $seq, \*STDOUT); 

55 } 

elsif( $opt->{ revcomp} ) { 
$seq->{ seq} = revcomp( $seq->{ seq}); 
$seq->{ name} = $seq->{ name) . "_revcomp 4 ; 
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print_fasta( $seq, \*STDOUT) ; 

) 

elsif( $opt->{ self_score)) ( 
my ($evidence, $self_score) = 

self_hyp( $glb, $param, $seq->{ seq) ) ; 
print { "score = $self _score\n n ) ; 

} 

else{ 

$starttime = time ( > ; 

if( ! $oligo_design) { 
my ($sec, $min, $hour , $mday, $mon, $year> = (localtin»e< 
$starttime) ) [0 . . 5] ; 
$year += 1900; 
$mon++; 

Sdate = *$year $mon $mday $hour : $min : $sec rt ; 
$oligo_design = 1 ; 

print "<PR0BE_RUN $date version=$glb-> ( version) >\n" ; 
if( $glb->{ view) =- / PARAMETER/ > { 
print dump_param( $glb, $param) ; 

} 

> 

if( $param->( oligo_sense) eq •reverse") { 
$seq->{ seq) = revcomp ( $seq->( seq) ) ; 

> 

print "KPROBEJESIG^Nn- ; 

if( $chec)c_blast && $param->{ fastadb)) { 
my ©fastadbs = split ( ■ $param->{ fastadb)); 
my $bdb; 

foreach { @f as tadbs ) { 

$bdb = formatblastdb< $glb, $_, " F " ) ; 
$param->{ blastdb) .= • ■ . $bdb; 

) 

$check_blast = 0; 

) 

($position_score, $blast_parse) = position_score ( $glb, 
$param, $seq) ; 

if( $glb->{ view) b- /FASTA/ ) { 
print "<FASTA>\n" ; 
print_fasta{ $seq, \*STDOUT) ; 

" — print "</FASTA>Vn" ; " 

) 

find_oligo( $glb, $seq, $position_score, $blast_parse, 
$param, $count); 

$runtime = time() - $starttime; 
if( $glb->{ view) /RUNTIME/ ) { 
print '<RUNTIME>$ runtime s</RUNTIME>\n w ; 

) 

print M </PROBE_DESlGN>\n B ; 

) 

) 

close INFILE; 

} 

if{ $oligo_design) { 
print "</PROBE_RUN>\n ,r ; 

) 


™ainf $ g ib) 
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#$Id: dyp.cv 1.15 2002/07/29 06:28:00 nt Exp $ 

DESCRIPTION 

Loading sequences : 
-f fastafile 
-seq sequence 

Set min_score to 0 to see all structures 
-min_score min_score 

Size of sliding window default is INT_MAX 
-depth depth 

Number o£ results to show 
-max^res max_res 

- s econdary„s true tur e 

Calculates the secondary structure of an oligo 

- s el f _annea 1 ing 

Calculates the binding energy between an oligo and 
it self. 

-hybridization 

Calculates the binding energy between two different 
oligos, or an oligo and its target. 


* 3 MAJ 2003 
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Copyright Niels 
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#include <stdio.h> 
#include <stdlib.h> 
#include <stdarg.h> 
#include <string.h> 
#include <limits.h> 
# include <math.h> 
# include <ctype.h> 
# include "getopt.h" 


/* fprintf, rand. 

/* calloc. . */ 

/* va_list. . 

/* strlen. . V 

/* USHRT_MAX. . * 

•/* pow */ 

/* tolower 

/* getopt_long * 


/* export MALLOC_TRACE=dyp_mem trace 
mtrace dyp $MAL L0C__T RAC E 
# include <mcheck.h> mtrace 


50 
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DEFINES ***************/ 


ftdefine THEVERSION "1.2 2002-10-11-15-46" 
frdef ine TRACEBACX_INT unsigned short 
#define TRACEBACK_INT_MAX USHRT_MAX 


#define HI printf ( °Hello\n M ); 
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#define SECONDARY_STRUCTURE 1 
#define SELF__ANEALING 2 
#define HYBRIDIZATION 4 
ftdefine TM 8 

5 

/* Increase the stack by this amount in case of overflow */ 
#define STACK_CHUNK 255 

/* Number of sequences to allocate space for before realloc */ 
10 ftdefine SEQS_CHUNX 255 

/* Size of sequence to allocate space for before realloc */ 
tdefine SEQ_CHUNK 1024 

15 /* Longest possible line in a matrix that can be read */ 
#de f ine MAX_LINE_LEN 5000 


20 
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#define MiN(a,b) (a<b?a:b) 
#define MAX(a.b) (a>b?a:b> 


/*#*****#*+* GLOBAL VARIABLES * 


25 int verbose = 0; 

char* program_name ; 


STRUCTURES 


typedef struct sequences^ { 
int nseq; 
int maxnseq; 
3 5 char **seqs; 

} sequences_type; 


typedef struct dynamic_str_t { 
40 int len; 

- int - maxlen; - — 
char *s; 
> dynamic_str_type; 


typedef struct pair_t ( 

TRACESACK_INT i ; 

TRAC EBAC K__INT j ; 
> pai retype; 


typedef struct stack_t { 
TRACEBACK_INT sp ; 

T RAC EBAC K_ INT size; 
55 pa i retype * stack; 

) stack__type; 
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typedef struct tokenarray_t < 
int ntok; 
int maxntok; 
char **tok; 
5 > tokenarray_type; 


typedef struct matrix_t { 
char *seql; 
10 char *seq2; 

int 11; 
int 12; 
int max_size; 
int *d; 

15 

) matrix_type; 


20 typedef struct score_rec_t { 


int 

alph_len; 

int 

gap_start; 

int 

gap_cont; 

int 

loop_score; 

int 

match_cont_f actor ; 

int 

match_ threshold; 

int 

min_s trong__ident_score ; 

int 

min__ident — score ; 

int 

min_s im.score ; 

char 

*alph; 

int 

*mat ; 


) score__rec_type ; 


35 typedef struct param_t C 
char *alph; 
int * mat hyp; 

int depth? 
int min^score; 
40 int max_res ; 

- sequences^type *seqs ? ■ 

sequences__type *seqs2; 
score_rec_type *score_rec; 
} par am_ type; 


45 


SYSTEM 


50 void die( char *fmt, ...)( 
va_list ap; 

fprintf( stderr, «Uhoh: ■ ) ; 
va_start{ ap, fmt) ; 
vfprintf{ stderr, fmt, ap) ; 
55 va_end{ ap) ; 

f print f ( s tderr , " \n - ) ; 
exit( 1); 
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void usage ( char *fmt, . . . ) { 
va_list ap; 

va_s tart ( ap , f mt ) ; 
vfprintf< stderr, fmt, ap) ; 
va_end( ap) ; 
fprintf( stderr, 

'\nUsage: dyp -h, — help\n" 
-v, — verbose\n" 
-r , — version\n" 
-u, — secondary„structure\n ' 
-a, — self_anealing\n" 
-y, — hybridizations n n 
-t, — tm\n n 

-o, — output filenameNn" 
-i f — input f ilenameNn" 
- j , — input2 f ilename2\n" 
~s, — seq sequence\n" 
-z,— aeq2 sequence 2 \n" 
-d f — depth depth\n" 
-n,--min_score score\n" 
-p, — max_res n\n" 
-m, — matrix matf n\n" 
-1, — allolig length_of_oligo\n* 
-f, — spikefreq freq\n n 
-e, — sample n_samples\n" ) ; 
fprintf( stderr, "\n n ); 
exit( 1); 


void hi ( char * f mt , . . . H 
va_list ap; 
35 fprintf( stdout, "Hi there 

va_start ( ap, fmt) ; 
vfprintf { stdout, fmt, ap) ; 
va_end< ap) ; 
fprintf( stdout, "\n"); 

40 } 


) ; 


45 
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void *salloc( int nob j , int size) { 
void *mem; 

mem = calloc ( nob j , size); 

/*printf( •calloc : allocating %d objects of size %d bytes\n", 

nobj , size) ; * / 
if { mem == NULL) { 

die( "Could not allocate %d x %d bytes\n", nobj , 
size) ; 
} 

return mem? 

) 


55 


FILE *sfopen( char *fn, const char *mode) ( 
FILE 'file; 

file = fopen{ fn, mode) ; 
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if ( file == NULL ) 

die{ "Could not open %s for %s.", fn, mode) ,- 
return file; 

} 


STRING FUNCTIONS 


10 char *empty_str( int len) { 
char *str; 

str = calloc( len+1, sizeo£{ char}); 

memset ( str, 1 len); 
str (len] = 0; 

15 return str; 

} 


char *reverse ( char *str) { 
20 int i, len; 

char c ; 

len = strlent str) ; 
for( i=0; i<len/2; i++) { 
2 5 c = str [i ] ; 

str[i] = str[(len-l) - i]; 

str[ (len-1) - i] = c; 

) 

return str; 

30 ) 


char * lower ( char *s) { 
int i, 1 = strlen( s) ; 
3 5 for< i=0; i<l ; i++) 

s(i] = tolower ( s[i}); 
return s; 

) 

40 

char tocomp ( char c) { 

char fr_s[35] = -acgtuitu?wsykvhdbxnACGTUMRWSYKVHDBXN n 
char to_s [35] = ■ tgcaakywsrmbdhvxxTGCAAKYWSRMBDHVXX B 
char *p; 
45 char cc; 

p = strchr( fr_s, c) ; 
if( P ) { 

p = to_s + (p - fr_s) ; 
50 cc = *p; 

} 

else 

cc = ■ x ' ; 
return cc; 

55 } 


char *comp( char *s) { 
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int i, 1 = strlen< s); 
for( i=0; i<l; 

s[i) = tocompf s[i] ) ; 
return s; 


/******* i/o of strings 

10 

void *print_lines { FILE *outfile, char **str, int nstr) { 
int linelen = 80; 
int i, j, p, len; 

15 if( nstr > 0 ) { 

len = strlenl str[0]); 

fort j=0; j<nstr; { 

if< strlen( str[j]) != len ) ( 
20 die( "String len = %d differs from first string len %d 

%s\n°, 

strlen( str[j]>, len, strtjl); 

> 

) 

25 p = 0; 

while { p < len) ( 

for{ j=0? j<nstr; j++> { 
for{ i = p; i < p + linelen && i < len; i++) { 
30 putc( str[jj[i], outfile); 

> 

putc( ' \n', outfile); 
} 

p += linelen; 
35 putc( ' \n', outfile); 

) 


40 


) 

-/********* STACK 


stack_type *init_stack { TRACEBACK_INT size) { 

45 stack_type •stacks- 

stack = calloc( 1, sizeof( stack_type) ) ; 

stack->sp = 0 ; 
stack->size = size; 

50 stack->stack= callocf size, sizeof( pair_type)>; 

return stack; 

} 

55 

void *free_stack( stack_type *stack) ( 
free ( stack->stack) ; 
free ( stack) ; 
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} 


void print _pair( pair_type pair) { 
5 printf( "%4d, %4d\n" , pair.i, pair.j) ; 

} 

void push( stack_type * stack, pair_type pair) { 
10 int new_size; 

if ( stack->sp < stack->size ) { 
stack->stack[ stack->sp++] = pair; 
if (0) {printf( "pushed :»); print_pair< pair);} 

) 

15 else { 

new„size = (stack->size + STAC K_CHUNK ) * sizeof { pair_ type) 
if( stack->stack = reallocf stack->stack, new__size) ) { 

stack- >size = stack- >size + STACK__CHUNK; 

stack->stack [ stack~>sp++l = pair; 

20 ) 

else { 

die( "Failed to realloc stack to %d bytes*, 
new.size) ; 
) 

25 } 


pair_type setpair( int i, int j) { 
30 pair_type pair; 

pair.i = i; 
pair.j = j; 
return pair; 

> 

35 

pair_type pop ( stack_type *stack) { 
if{ stack->sp > 0 ) 

return stack->stack [ — ( stack- >sp) } ; 
40 else 

return setpair( TRACEBACK_INT_MAX , TRACEBACK_INT_MAX ) 

) 


45 void test_stack{ ) { 

stack_type * stack; 
int i; 


50 


Stack = init_stack( STACK_CHUNK> ; 

for{ i=0; i<30; i++) 
push< stack, setpair( i, i+1) ) ; 


for( i=0; i<30; i++) 
55 print_pair( pop( stack)); 


free_stack( stack); 

> 
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DYNAMIC STRINGS 


10 


f ree_dynamic_str_type ( dyn ami c_str_ type *seq) { 
free< seq->s) ; 
free( seq) ; 

) 


dynamic_st retype *new_dynamic_str_type ( ) { 
dynami c_a t retype * s eq ; 

15 seq « calloc( 1, sizeof ( dynami c_st retype) ) ; 

seq->len =0; 
seq->majclen = SEQ_CKUNK; 

seq->s = calloc { seq~>maxlen, sizeof { char) ) ; 

seq->s [0] = ' \0 ' ; 
20 return seq; 


dynamic_str_type *addchar( dynamic_str_type *seq, int c) { 
25 if < seq == NULL) { 

seq = new_dynamic_s t r_type {) ; 

> 

if ( seq->len + 1 >= seq->maxlen ) { 
seq->maxlen += SEQ_CHUNK ; 

30 if{ ! {seq->s = realloc ( seq->s, seq->maxlen * sizeof ( 

char))) ) 

die{ "Failed to realloc seq to %d chars ■ , seq->maxlen) ; 

} 

seq->s[ seq->lenf+] = c; 
35 return seq; 

> 


/************* OLIGO FUNCTIONS ********/ 

40 

long n_nmer( int n_mer, char *alph) { 
int alphlen = strlen( alph) ; 
long njnmer = 0; 

45 

if C alphlen == 0 ) 

n_nmer = 0 ; 
else 

n_nmer = pow( alphlen, n_mer) ; 

50 

return n_nmer ; 

> 


55 char *make_nmer ( long number, int n_tner, char *alph) { 
long i, j , n = number; 
int alphlen = strlen( alph); 
char *seq; 
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seq = empty_str( n_mer) ; 

if ( alphlen == 0 > 
5 strcpy ( seq, " - • ) ; 

else ( 

for( j=0; j<n_mer; { 
i = n % alphlen; 
n = n / alphlen; 
10 seq[j] = alph(i] ; 

} 

) 

seq = reverse ( seq) ; 
15 return seq; 


char *make_randon\_nmer ( int n_mer, int spikefreq, char *alph) { 
2 0 long i , j ; 

int alphlen = strlen( alph) ; 
int halflen = alphlen / 2; 
char *seq; 
char phase = 0; 


25 


30 


55 


seq s empty_str ( n^mer) ; 

if( spikefreq) 

phase = (rand() * spikefreq) / <RAND_MAX+1 > ; 


if{ alphlen == 0 > 

strcpy { seq. ■-■ ) ; 
else { 
35 iff spikefreq) 

for( j=0; j<n_rner; j++H 
i = (rand() * halflen) / (RANDJKAX+l) ; 
if( ( (j+phase) % spikefreq) == 0) 
i += halflen; 
40 seqlj] = alph[i] ; 

else 

for( j=0; j<n_mer; { 
i = (randO * alphlen) / (RAND_MAX+1) ; 
45 seqiil = alph[U; 

> 

} 

return seq; 

) 

50 


/************* MATRIX I/O 


score_rec_type *init_score_rec_type ( int gap__start, int 
gap_cont , 

int loop_score, int match_Chreshold . 
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int ma tch_cont_f actor, 
int min_s t rong^i dent.sc ore , 
int min_ident_score, int min_sinuscore, 
char *alph, int *mat) { 
score. rec_type *score_rec; 

score^rec = (score_rec_type *) calloc( 1, sizeof ( 

score_rec_ type) ) ; 

score_rec->alph_len - strlen( alph) ; 

score_rec->gap_start = gap_ start; 

score_rec->gap_cont = gap_cont; 

score^rec->loop_score = loop_ score; 

score_rec->match_threshold = match_threshold; 
score_rec->rnatch_cont — factor = match_cont_f actor; 
score_rec->alph = alph; 

score_rec->mat = mat; 

score_ rec->min_strong_ident_score = min_strong_ident_score 
score_rec->min_ident_score = min_ident_score; 
score_rec->minLsim_9core = min__sin\_score ; 

return score_rec; 

> 


void f ree_score_rec_type ( score_rec_type *score_rec) { 
free( score_rec->alph) ; 
free< score_rec->mat> ; 
free( score_rec) ; 


void £ ree_param_type ( param_type * par am) { 
f ree_score_rec_type I param->score - rec) ; 
free( param->seqs->seqs) ; 
free( param->seqs> ; 
free ( param->seqs2 ->seqs ) ; 
free ( param->seqs2 > ; 
free ( param) ; 

} 

void vali da te_s cor e_rec_ type ( score_rec_type *score_rec, int 
inin. int max) { 
int i , j ; 

if( strlen( score_rec->alph) < 1) { die ( "No alphabet\n" ) ; 
if( strlen( score_rec->alph) != score_rec->alph_len) { 
die ( "alph length mismatch %d,%d\n*, strlen( score^rec- 
>alph) , 

score__rec->alph_len) ; 

) 

for( i=0; i<score_rec->alph_len; i++) { 
for( j=0 ; j<score_rec->alph_len; { 

if( score_rec->mat [i*score_rec->alph_len+ j ] <min ) 
die{ "Matrix out of range min=%d" , min) ; 

if( score_rec->mat [i*score_rec->alph_len+ j ] >max ) 
die ( "Matrix out of range %d max=%d D / 

score_rec->mat [i* score_rec->alph_len+ j ] , max) ; 

) 
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) 

if ( score_rec~>gap_start > max ) die ( *gap_start out of range 
max">; 

if { score_rec->gap_start < min ) die ( "gap_start out of range 
5 max" ) ; 

if( score_rec->gap_cont > max > die ( M gap_cont out of range 
max* 1 J ; 

if< score_rec->gap_cont < min ) diet B gap_cont out of range 
max- ) ; 

10 if( score_rec->loop_score > max ) die ( ■ loop_score out of 

range max ■ ) ; 

if( score_rec->loop^score < min ) die ( " loop_score out of 
range max " ) ; 
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void print_mat( char *seql, char *seq2, int *mat) { 
int i f j ; 

int lenl = strlent seql) ; 

20 int len2 = strlen( seq2) ; 

int p_num = 1; 

if ( p_num ==1) { 
printf( " 
25 for{ i=0; i<lenl; i+ + ) { 

printf( ■ %2d i); 

) 

print f( - \n " ) ; 

) 

30 printf< " ■); 

for{ i=0 ; i<lenl; i++) { 

printf( ■ %c °, seql[i]); 

) 

printf< -\n\n-); 
35 for( j~0; j<len2;j++) { 

if ( p_num == 1) { 

printf( -%2d %c " , j, seq2[j]); 

} 

else { 

40 printff "%c n , seq2[j]}; 

for( i=0; i<lenl; i++ ) { 

printf( "%5d mat [ j*lenl+i] ) ; 

> 

45 printfl »\n w ); 

) 

} 


50 TOKENTIZE A LINE 


void addtoken( char* token, tokenarray_type *tokena) { 
tokena->ntok++ ; 
55 if( tokena->ntok > tokena->maxntok ) ( 

tokena->maxntok += 100; 
if ( j (tokena->tok = 
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reallocl tokena->tok, tokena->maxntok * sizeofl 
char) ) ) ) 

die< "Failed to realloc tokenarray to %d tokens", 
tokena->maxntok) ; 
5 > 

tokena->tok[ tokena->ntok - 1] = token; 

) 


10 tokenarray_type *alloc_tokenarray_type { ) { 
tokenarray__type *tokena; 

tokena calloc( 1, sizeaf( tokenarray_type) ) ; 

tokena->maxntok = 30; 
15 tokena- >nt ok = 0; 

tokena->tok = calloc ( tokena->maxntok, sizeo£{ char*)); 
return tokena; 

} 

20 

void *f ree_ tokena rray_ type ( tokenarray_type * tokena ) { 
free( tokena- >tok ) ; 
free! tokena) ; 

} 

25 

void *tokenize( char *s. tokena rray_type * tokena ) { 
char * token; 

30 tokena->ntok- 0; 

token = strtok{ s, ■ \n" ) ; 
if< token ) 

addtoken ( token, tokena) ; 
while ( token = strtokt NULL, " \n" ) ) 
3 5 addtoken ( token, tokena); 

> 


int tokenarray_is_num_list ( tokenarray_type *tokena) { 
40 int i = 0; 

- - char *endp; 

int res = 0; 

while( i<tokena->ntok && strtol { tokena- > tok ( i J , &endp, 10) 
45 i) { 

if( *endp != 0 ) 

res = 0; 
i++; 

) 

50 if{ i == tokena->ntok) 

res = i ; 
else 

res - 0 ; 
return res; 

55 ) 


char *tokenarray2str( tokenarray_type * tokena ) { 
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int i=0; 
char *alph; 

alph = calloc ( tokena->ntok + 1. sizeof ( char)}; 
5 for( i=0; i<tokena->ntok; i++) { 

if( strlen{ tokena->tok[i) > != 1) 

die( "Sorry, the word %s was found were a char was 
expected. " , 

tokena->tok[i] ) ; 

10 alph[i] = *tokena->tok[i) ; 

> 

alphti+1] = 0; /* EOL */ 
return alph; 

15 ) 


int tokenarray2dat { tokenarray_type *tokena, int with_num, 
matrix_type *mt, char *c) { 

20 int i = 0; 

int j = 0; 
char * endp ; 
int err = 0; 
int *dat; 

25 

if( with_num ) { 

i£( atoi( tokena->tok[i] ) !=mt->12) 

err =* 1; 
i++; 

30 } 

if( strlen{tokena->tok[i] ) != 1) 

err = 2 ; 
*c = * tokena->tok[i] ; 

35 

if { tokena->ntok - i != mt->ll ) 

err = 3 ; 
else { 

dat = calloc ( mt->ll, sizeof ( int)); 
40 for< ; i<tokena->atok; i++) { 

dat[jj = strtoK- tokena->tok[i] , &endp, 10); 
if( *endp != 0 ) 
err = j +10; 
j++; 

45 } 
> 

if( !err) { 

if { mt->ll * mt->12 > mt->max_size ) { 
mt->max_size += 100; 
50 if ( ! (rat->d = 

realloct mt->d, mt->max_size h sizeof ( int))) ) 
die( "Failed to realloc matrix to %d int's", mt- 
>max_size) ; 
) 

55 for{ j=0; j<mt->ll; j++) 

mt->d[mt->ll * mt->12 + j J = datfjl; 
mt->12++ ; 

) 
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free l dat) ; 
return err; 

> 

5 

/*********** i/o score_rec_type ***•» + ***** + */ 


matrix_type *read_mat< FILE *file) ( 
10 char *s, *res; 

int n = MAX_LINE_LEN; 

int co l_name_ found - 0 ; 

int numbers_found = 0; 

tokenar ray_type * tokena ; 
15 dynamic_str_type *seq = NULL; 

char c ; 

matrix_type *mt ; 

mt = calloc( 1, sizeof ( matrix_type) ) ; 

20 mt->max_size= 100; 

mt->d = calloct mt->max_size, sizeof ( int)); 

mt->12 = 0; 


25 


tokena = alloc_tokenarray_type ( ) ; 

s = calloc ( n + 1, sizeof ( char) ) ; 

seq = new_dynamic_st retype { ) ; 


do { 

30 res = fgets ( s. n + 1 ( file); 

if ( res != NULL ) { 
tokenize( s, tokena) ; 

/*for <i=0;i<tokena->ntok;i++) printf ( "%d %s\n n , i, tokena- 
>tok[i)) ;*/ 
35 if( 1 col__name_f ound ) { 

/* Is it a column number line? */ 

if ( ! numbers_f ound && (mt^>ll = tokenarray_is_num__Iist ( 
tokena) ) ) { 

numbers_ found = 1 ; 
40 ) 

-else 

/* Is it a column name line? */ 
mt->seql = tokenarray2str I tokena); 
if( number s_ found ) 
45 if( mt->ll != strlen( int->seql) ) 

die ( "Expected a string of length %d but found %s'*, 

mt->ll, 

mt->seql) ; 

else 

50 mt->ll = strlen( mt->seql) ; 

c o l_name_f ound - 1; 
) 

} 

else { 

55 /* skip empty lines */ 

if( tokena ->ntok > 0) { 
/* Is it a data line? */ 


15 


if( tokenarray2dat ( tokena, numbers_f ound, mt, &c) == 0) 
seq = addchar( seq, c) ; 


{ 


} 

else { 
res = NULL; 


10 


15 


20 


) 

) while ( res) ; 

mt->seq2 = cailoc ( strlen( seq->s) + 1, sizeof( char)); 
strcpy( mt->seq2, seq->s) ; 
free_dynamic_st retype ( seq) ; 

free( . a) ; 

free_tokenarray_type ( tokena) ; 
return mt; 


25 


30 


void print_score_rec_type{ score_rec_type *score_rec) { 
printf( ft alph_len %d\n" , score_rec->alph_len) ; 
printf{ n gap_start %d\n" , score_rec->gap_start ) ; 
printf{ "gap_cont %d\n n , score_rec->gap_cont) ; 
printf( "alph %s\n", score_rec->alph) ; 

print_mat( score_rec->alph, score_rec->alph # score_rec->mat] ; 

) 


35 


40 


45 


50 


55 


void read_score_rec_type ( char *fn, score_rec_type *score_rec){ 
FILE *file; 
matrix_type *mt; 

file = sfopenf fn, B r"); 

fscanf( file, a alph_len %d\n", & <score_rec->alph_len) ) ; 

fscanf( file, "gap_start %d\n- , & (score_rec->gap_start) ) ; 

fscanf( file, "gap_cont %d\n" , & (score_rec->gap_cont) ) ; 

fscanf( file, "min^strong^i den t_s core %d\n fl , &<score_rec- 
>min^strong:_ident_LScore ) ) ; • -- • ■ 

fscanf( file, "min_ident_score %d\n" , & ( score_rec- 
>min_ident_score) ) ; 


f scanf ( file, 
>min_sim_score) ) ; 
f scanf ( file, 
mt = read_mat ( 
if( mt->ll != 
die( "Matrix 


miiLS im_s core 


%d\n", &{score_rec- 
%s\n", score_rec->alph) ; 


"alph 
file) ; 
mt->12 ) 

not quadratic %d %d B , mt->ll # mt->12 ) ; 
if { strcxnp( mt->seql, mt->seq2) 1 = 0) 

die("Matrix sequences not identical %s %s", mt->seql, 
mt->seq2 ) ; 

if( strcmp( mt->seql, score_rec->alph) != 0) 

die{ "Matrix alphabet differs from alph %s %s", mt->seql, 
score_rec->alph) ; 

if < score - rec->mat ) 

free( score_rec->mat) ; 
score_rec->mat - mt->d; 
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10 


free( mt->seql) ; 
free ( mt->seq2 ) ; 
free( mt) ; 
fclosef file); 

) 


MATRIX 


int *seq2idx_seq ( char *seq, score_rec_type *score_rec) { 
char c [ 2 ) ; 
int i, 1; 
int *idx_seq; 
15 int id; 

1 = strlenf seq) ; 

idx_seq = calloc ( 1, sizecf ( int) ) ? 

20 for( i=0; i<l; i++) { 

c[0) m seq[i] ; 
c[l] = 0; 

id = strcspn{ score_rec->alph, c) ; 
if ( id >= score_rec->alph_len) { 
25 fprintf( stderr, 

"Sorry, the character %c was not found in the alphabet 
%s\n n , 

seq[i] , score_rec->alph) ; 
id = score_rec->alph_len - 1; 

30 ) 

idx_seq[i] id; 

> 

return idx_seq; 

) 

35 

int matidxt int i, int j , int 11, int 12) { 

if ( i<0 ) die( "matrix index i (%d) less than zero", i); 
if( i>=ll ) diet "matrix index i <%d) too large <%d)'\ i 
40 11); 

- - -- -if ( j<0 ) die( "matrix index j- (%d> less than zero " , j ) ; 

if ( j>=12 ) die< "matrix index j <%d) too large (%d>°, j 
12); 

return j*ll + i; 

45 ) 


int matsize( int 11, int 12) { 
return 11*12; 

50 ) 


int m( int i, int j , xnatrix_type *mat) { 

return mat->d[ matidx( i, j, mat->ll, mat->12)J 

55 ) 


void print_annot( char *seq, char *annot) { 
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printf ( "%s\n* , seq) ; 
printf C , %s\n n f annot) ; 

) 


char match_sym( char a, char b, score_rec_type *score.rec) 
{ 

int il , i2 ; 

char * s ; 
int score ; 

s = strchr( score_rec->alph, a) ; 
if( s == NULL) return ' '; 

11 = s - score_rec->alph; 

s = strchrl score_rec->alph, b) ; 
if( s == NULL) return ' • ; 

12 = s - score_rec->alph; 

score = score__rec->mat [ i2*score_rec->alph__len+il ) ; 

if ( score >= score_rec->min_strong_ident_score ) 
return ■ | * ; 

else if( score >= score_rec->min_ident_score ) 
return ' : • ; 

else if( score >= score^ec-^in^sirn^score ) 
return ' . ' ; 

return ' • ; 

) 


/************* ALIGN - self association **♦★*****♦***+*+/ 


void align_global_traceback( FILE *outfile, param_type 
char *seql, char *seq2, int *path) ( 
= strlen ( seql) ; 
= strlen ( seq2 ) ; 
= 0; 
= 0; 


par am, 


mt 
int 
int 
int 
- int • 
int 
char 


11 
12 

done 
is 
i ,« j ; - 
il, i2; 
•qseq. 


•mseq, *tseq? 
if( 11>0 && X2>0 ) { 


qseq = calloc ( 
mseq - calloc ( 
tseq = calloc ( 


11+12+1, 
11+12+1, 
11+12+1, 


sizeof ( char) ) ; 
sizeof ( char) ) ; 
sizeof ( char) ) ; 


i =11-1 
j =12-1 

11 = 11 - 1 

12 =12-1 


do { 

if( verbose ) 
printf ( -%d %d\n", 


3) 
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if< pathCj*ll+il == 0) { 

if ( il >= 0 ) { 
QseqCis] = seql[ilj; 
5 il— ; 

) 

else { 
qseqCis] = 
xnseq[is] = ' ' ; 
10 } 

if( i2 >= 0 ) { 
tseqtis] = seq2(i2]; 
12 — ; 

> 

15 else { 

.. tseq.[is] = 

mseg(is) = ' ' ; 
) 

/* 

20 if( tseq[is] == qseq[is] ) { mseq[is] = 1 : • ) 

else { mseqCis] = ■ • ; } 

*/ 

mseq[is] = match_sym( tseqC is ] , qseq [is] , param- 
>score_rec) ; 

2 5 is++; 

i — ; 
j — ; 

it( i<0 && j<0 ) { 
done = 1 ; 
30 } 
} 

else if ( path(j*ll+i] == 2) { 
qseq [is] = 
mseq[is] ~ 1 1 ; 

3 5 tseq[is] - seq2(i2); 

12—; 
is++; 

D — ; 

) 

40 else if( pathCj*ll+i] == 1) { 

qseq [is] = seql[il]; 
il— ; 

mseq[is] = ' ' ; 
tseqtis] s ' - ' ; 
45 is++; 

i — ; 

) 

if< i < 0 ) i = 0; 
if( j < 0 ) j = 0; 
50 i£< verbose ) 

printf( "%d %d %c %c %c\n* , i, j, qseq[is-l), mseq[is-l] 
tseq(is-l] ) ; 

} while { ! done) ; 
qseq [is] = 0; 
55 mseqfis] = 0; 

tseq[isj = 0; 


fprintf( outfile, "%s\n", reverse( qseq)); 
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10 


40 


55 


fprintf< outfile, °%s\n", reverse ( mseg) ) ; 
f printf ( outfile, •%s\n", reverse ( tseq) ) ; 

free( qseq); 
free ( mseq) ; 
free( tseq); 

) 


) 


int align_local_traceback ( FILE *outfile, param_type *param, 


int 11, int 12, int *idx_segl, int *idx_seq2, 
char *mask_seql, char *mask_seq2, char *seql, char *seq2, 
15 int *mat, int *path) { 

stack_type * stack; 

int i, j, k; 

pair_type pair; 

20 char *annot; 

matrix_type *mt; 

int score, maxscore, f irstscore-0, these or e = INT_MAX , 
max__i , max_j ; 

int n_hyp - 0 ; 

25 char *outstr[3] ; 

char *qseq, *mseq, *tseq; 

stack = init_stack( 5TACK_CHUNK) ; 

30 m.t = calloc ( 1, sizeof ( matrix_type ) ) ; 

mt->d = mat; 

mt->ll = 11; 

mt->12 = 12; 

35 qseq = calloc ( 11+12+1, sizeof ( char)); 

mseq = callocC 11+12+1, sizeof ( char)); 
tseq = calloc ( 11+12+1, sizeof ( char)); 


while ( thescore >= param->min_score n_hyp < param- >max_res } 


maxscore = INT__MIN; 
for( j=0; j<12; j + + ) { 
for( i=0; i<ll; i++) { 
45 score = mat [ matidx( i, j, 11, 12)}; 

if( score > maxscore) { 
maxscore = score; 
max_i = i; 
max_j = j; 

50 ) 
) 

) 

if ( } firstscore && maxscore >= param->min_score) firstscore 
= maxscore; 


push< stack, setpair( max_i , max_ j > ) ; 

thescore = mat [ matidx ( max_i, max__j , 11, 12)]; 

mat[ matidx ( max_i , max. j , 11 . 12)] = 1; 
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/* Trace back */ 
k=0; 

while ( stack->sp > 0 ) ( 
5 pair = pop( stack) ; 

if(0}{ printf( "pop ); print_pair( pair);} 

i = pair.i; 
j = pair.j; 

if( path[ matidx( i, j , 11, 12)] == 1 J { 
10 ift i>0 && mat[ matidx( i-1, j, 11, 12)] > 0 ) { 

push( stack, setpair( i-1, j)); 

> 

qseq[k] = seqlli]; 
raseq[k] = ' ' ; 
15 tseq[k] = ' -'; 

> 

else if ( path[ matidx< i, j , 11, 12)] == 2 ) { 
if ( j>0 £r& nvat[ matidx( i, 11, 12)] > 0) { 

20 push{ stack, setpair( i, j-D); 

) 

qseq[k] = ' - ■ ; 
mseq(k] = ' r ; 
tseq[k) = seq2 [ j 3 ; 
25 k++; 

) 

else if( path( matidx< i, j, 11, 12)] == 0 ) { 
if( i>0 Sc& j>0 && matt matidx( i-1, j-l, 11. 12)) > 0 ) { 
push( stack, setpair( i-1, j-D); 

30 ) 

qseq[k] = seql[i); 
tseq[k] - seq2 [ j) ; 

mseq[k] = match_sym ( tseqfk] , qseqtk), param- 
3 5 >score_rec) ; 

k++; 

if<0)printf( • ( ) %d %d\n- ( j, i ); 
) 

} 

40 if< thescore >= param->min_score ) { 

n_hyp++; 

if( n_hyp <~ param->max_res ) { 
qseq[k) = 0; 
mseq(k) = 0; 
45 tseqtkl = 0; 

outstr[0] = reverse ( qseq) ; 
outstrll] * reverse ( nvseq) ; 
outstr[2] = reverse ( tseq) ; 
if ( outfile != NULL ) { 
50 f print f( outfile, "Score= %d\n" , thescore); 

print_lines< outfile, outstr, 3); 
) 

> 

) 

5 5 annot = empty_str( 11); 

free ( annot) ; 

} 
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free ( mt) ; 
free ( qseq) ; 
free ( mseq) ; 
free ( tseq) ; 
free__stack< stack) ; 
return first score ,- 


10 void out_align_f illf param_type *param, int 11, int 12, int 
*i&x_seql, 

int *idx_seq2, int *mat, int *path, char *seql, char *seq2){ 
int i, j , k, max^path ; 
int score, max_score; 
15 int h_size = 3 ; 

.int. gap_score;.„ . 

int match_f actor ; 

/* 

20 0 = match or mismatch, go diagnoally 

1 = insertion in query, go horizontally 

2 = gap in query, go vertically 

*/ 

25 for< j=0; j<12; j + + ) { 

for{ i=0; i<ll; i++) { 

if(l)printf( N %d%d\n B , i, j); 

if{ patht matidx( i, j, 11, 12)3 == -1 ) 

30 gap_score » param->score_rec~>gap_cont ; 

else 

gap_score = param->score_rec->gap - start ; 
max_score = mat [ matidx ( i, j, 11, 12)] - 
gap_score; 

3 5 max_path= -1 ; 

if( path[ matidx( i, j + 1, 11, 12)] == -2 ) 
gap__score = param->score_rec->gap„cont ; 
else 

40 gap__score = param->score_rec->gap_start ; 

score - matt matidx { i, j + 1, 11, 12) J - gap_score; 
iff score > max_score) { max_score ~ score; max_path - 

-2;} 

45 iff i-j <= h_size ) { 

score = param->score_rec->loop_score; 
} 

else { 

iff path[ matidxf i-1, j + 1, 11, 12)] == 0 ) 
50 match_factor = 1; 

else 

match__f actor - param->score_rec->match_cont_f actor ; 
score = mat [ matidxf i-1, j + 1, 11, 12)] + 
match., factor * 
55 param->score_rec->mat [ idx_seql[i] * 

param->score_rec->alph_len + idx_seq2 [ j] ) ; 
> 
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if<0) { 

printf( "%d %d %d %d %d %d %d\n'\ i # j , idx_seql til , 
idx_seq21 j) , 

param- > score_r ec ->alph_l en, 
5 param- > score_rec ->mat [ idx_seql[i] * param->score_rec- 

>alph_len + 

i±x_seq2[j]], 

idx__seql[i] * param- :>score_rec->alph_len + idx__seq2 { j 3 ) ; 

} 

10 

if { score > max_score) ( max_score = score; max_path = 

0;} 

fori k=j+l; k<i; k++) { 
score = mat[ matidx( i, k+1, 11, 12)] + mat I matidx( k, 
15 j, 11, 12)) 

- param- > sc or e_rec->gap_s tart; _ „. ... .. . 

if( score > max_score) ( max_score = score; max_path = k; > 
if<0)printf< *%d %d %d %d %d\n", i, k, k, j, score) ,- 
} 

20 

if ( max^score > 0 ) { 
matt matidx( i, j , 11, 12)] = max_score; 
) 

else { 

25 mat[ matidx< i, j, 11, 12)1 = 0; 

) 

path[ xnatidx( i, j, 11, 12)) = max__path; 
if(0) { 

printf ( *%c%c %3d %3d score: %3d path: %3d\n" , seql[i) r 
30 seq2[j] , 

i , j , max_score, max_path) ; 

) 

) 

) 

35 ) 


void align_fill( param_type *param, int 11, int 12, int 
*idx_seql, 

40 int *idx_seq2, int *mat # int *path, char *seql, char *seq2){ 

int i , j ; 

char cl[2J , c2 [2] ; 

int il, i2; 

int sm, si, sd, max_s ; 
45 int dir; 

if{ verbose ) 

printf( n %d %d\n" , 11, 12); 

50 /* 

0 = match or mismatch, go diagnoally 

1 = insertion in query, go horizontally 

2 = gap in query, go vertically 

*/ 

55 

for{ j=0; j<12; j++) { 
c2[0] = seq2[jl ; 
c2(l] = 0; 
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i2 = strcspn< param->score_rec->alph, c2); 
for( i=0; i<ll; i++) { 

clfO] = seql[i]; 

cl[l] = 0; 

il a strcspn< param->score_rec->alph, cl); 


if( i > 0 && j > 0) { 
sm = matl<j-l)*ll+(i-l)3; 
) 

else { 
if< i > 0 ) { 

sm = parazn->score_rec->gap_start + 
(i-1) * pdram->score_rec->gap_cont; 

> 

else if ( j > 0 ) .{ . .„ 

sm = param->score_rec->gap_start + 
(j-1) * param->score_rec->gap_cont; 

> 

sm = 0 ; 
} 

sm += param->score_rec->mat ( i2 *param->score_rec- 

>alph_len+il] ; 

/*if( (i > 0) && (path[ j*ll+<i-l) ] > 0J) { */ 
if( i > 0 > { 
si = matlj*ll+ (i-1) ] ; 
) 

else { 

Si = 0 ; 

) 

if( {i>0 && path[j*ll+(i-l) ] > 0) || ( j>0 && path[{j- 
l>*ll+i] > 0)) { 

si -- param->score_rec->gap_cont; 
) 

else { 

si -= param->score_rec*>gap_start; 
) 

/*if( (j > 0) && (patht { j-ii *n+i] > o)j { */ 
if( j > 0 ) { 
sd = mat( ( j-1) *ll+i) ; 
} 

else ( 
sd = 0 ; 
> 

sd -= param->score_rec->gap_cont:; 


max_s 

dir 

if( si 
max_s =r 
dir 

} 

if{ sd 
max.s = 
dir 


= sm; 
0; 

> max_s) 
si; 

1; 

> max_s) 
sd; 

2; 
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mat(j*ll+i] = MAX ( max_s, 0) ; 
path(j*ll+i] = dir; 

/*printf ( "%s %s %d %d %d %d\n", cl, c2 , il, i2, 
5 score, mat I ) ; * / 
) 

) 

if( 0) ( 

10 print_mat( seql, seq2, mat); 

print ft "\n"); 

print_mat( seql, seq2, path); 
printft nn'); 

) 


15 


) 


/-V.P.rintf ( "scor.e=_%.dNn-_,._mat [11U2-1J ) ;._ + /. 


20 int align ( file * out file, param_type *param, char *seql, char 
*seq2> { 

int 11 , 12; 

inc *mat, *path; 

char *mask_seql, *mask_seq2; 
25 int *idx_seql, *idx_seq2; 

int score = INT_MAX; 

11 = strlent seql); 
30 12 = strlen( seq2); 

mat = calloc ( matsizet 11, 12), sizeof ( int)); 

path = calloc I matsizel 11 , 12), sizeof ( int) ) ; 

mask_seql = calloc { 11+1, sizeof ( char) ) ; 

mask_seq2 = calloc ( 12+1, sizeof ( char)); 
3 5 strcpy( mask_seql, seql) ; 

strcpy ( mask_seq2 , seq2 ) ; 

if(0) { 

40 print_mat( seql, seq2, mat); 

printft "\n"); 

if(l) print_mat< seql, seq2, path); 
printft "\n w ); 

) 

45 

align_fill( param, 11, 12, idx_seql, idx_seq2, mat, path, 

mask_seql, mask_seq2); 
score = align__local__traceback( out file, param, 11, 12, 
idx_seql, idx_seq2, 
50 mask_seql, mask_seq2, seql, seq2, mat, path); 

f reel mat) ; 
f ree( path) ; 
free( mask_ seql) ; 
free( mask_seq2); 
55 return score; 

) 


25 


/**•***• Nussinov - secondary structure prediction ******/ 

int nussinoy_local_tracebac)c( FILE * out file, param_type *param, 
5 int depth, 

int 1, int* idx_seq, char *mask_seq, char *seg, 
int ♦mat, int *path) ( 
stack_type * stack; 
int i, j, k; 

10 pai retype pair; 

char *annot; 
mat r ix_type * mt ; 
char *mask; 

int score, maxscore, thescore=INT__MAX, max_i , max_ j ; 

15 int maxpath; 

_.int overlap = 0;. . _ 

int n__hyp = 0 ; 

char *outstr[2]; 

20 stack = init_stack< STACK_CHUNK) ; 

mt = calloc { 1, sizeof ( matrix__type) ) ; 
mt->d = mat; 

mt->ll = 1; 

25 mt->12 = 1; 

annot = empty_str< 1); 

mask = empty_str ( 1 ) ; 

30 while ( thescore >= param->min_score && n_hyp < param- >max_res ) 

{ 

maxscore = INT _KIN; 
for( j=0; j<l; { 
3 5 if( mask[j] ==■«){ 

for( i=j; i-cdepth + j + 1 && i<l; i + +) { 
if{ maskti] ==■•>( 

score = mat[ matidx{ i, j , 1, 1)]; 
if{ score > maxscore) { 
40 maxscore = score; 

max_i = i ; 
max_j = j; 

> 

} 

45 ) 

} 

} 

push{ stack, setpair( max_i, max_ j ) ) ; 
5 0 mask [ max_i ] ~ 1 s ' ; 

mask [ max_j ] = ' s 1 ; 

/* Trace back */ 

55 while { stack->sp > 0 ) { 

pair = pop( stack); 

if{0)< printf{ "pop ); print_pair< pair);} 

i = pair. i; 
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j a pair.j; 
if( i > j ) < 
if( path! matidx{ i. j, 1, 1) J == -1 ) { 
if( (overlap || mask[i-l] == • •) && 
5 mat! matidxt j , 1, 1>) > 0) ( 

push( stack, setpair( i-1, j ) ) ; 

) 

mask[ij = ' i 1 ,- 

) 

10 else if( path[ matidx{ i, j , 1, 1)] » -2 ) { 

if( ( overlap || mask[j + l) == ■ ' ) && 
mat[ matidx( i, j+1, l f 1) J > 0) < 
pushl stack, setpairl i, j+D); 

) 

15 mask[j] = 'd'; 

else if( path[ matidxt i, j, 1, 1) 1 == 0 ) { 

if < ( overlap | | (mask [i-1] == ■ ' && mask! j + == ' ' 

) ) 

20 matl matidx( i-1, j+1, 1, 1) ] > 0 J { 

push{ stack, setpair( i-1, j+1)); 

) 

if( param->score_rec->match_threshold <= 

parain->score_rec->mat [ idx_seq[i] * paran\->score_rec- 
25 >alph_len + idx__seq[ j ) J ) { 
annot [ i 3 = ' ) 
annot [ j ] = * ( ' 

) 

else { 

30 annot [i) = 1 

annot [j) = ■ 

} 

mask_seg[i] = • 
mask_seq( j J = * 
3 5 mask [ i ] = 1 m ' ; 

mask[ j ] - «m' ; 

if(0)printf< ■ ( ) %d %d\n" , j , i ); 

) 

else { 

40 k a path! matidx( i, j. 1, 1)3; 

push( stack, setpairf k , j)) ; 
push( stack, setpair( i , k+1)); 

) 

} 

45 } 
if (0) { 

/* Trace forward */ 


50 


55 


push( stack, setpair( max_i, max_j ) ) ; 

while ( stack->sp > 0 ) ( 
pair = pop( stack) ; 

if(l){ printf( "pop : b ); print_pair ( pair);} 

i = pair.i; 
j = pair. j ; 

if ( i<depth + j && i<l-l && j>0 ) { 
maxscore = 0 ; 
maxpath = -3; 
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if( path[ matidx( i+1, j, 1, 1) ] == -X && 
( overlap || mask(i+l] == ' * ) && 
xnat[ matidxf i + 1, j, 1, 1)] >= maxscore ) { 
maxscore = mat ( matidx( i+1, j, 1,1)]; 
5 maxpath = -1; 

) 

if( pathC matidx( i, j-1, 1, 1)) == -2 && 
( overlap || mask[j-l] == • ■ ) && 
mate matidxl i, j-1, 1, 1)] >= maxscore ) { 
10 maxscore = mat [ matidx( i, j-1, 1, 1 ) ] ; 

maxpath = -2 ; 

) 

if{ path[ matidx( i+1, 1, l) ] ==0 && 

{ overlap | | <mask[i+l] == 1 r && mask( j-1) = 

15 ')) && 

- - mat[ matidx<„i+l, j-1, ;1, 1)] >=: maxscore ) {. _ 

maxscore = mat [ matidx( i+1, j-1, 1, 1)]; 
maxpath = 0 ; 

) 

20 if( path[ matidx< i + 1, j-1, 1, 1)] > 0 && 

mat[ matidxf i+1, j-1, 1, 1)} >= maxscore ) { 
maxscore = mat [ matidx( i + 1, j-1, 1, 1)]; 
maxpath a path[ matidx( i+1, j-1, 1, 1)]- 

} 

25 

if ( maxpath == -1 ) { 

push< stack, setpair< i+1, j)) ; 
mask [i+1] = ■ r * ; 

) 

30 else if ( maxpath == -2 ) { 

push< stack, setpair( i, j-1)); 
mask[j-l] = 'D'; 

) 

else if ( maxpath == 0 ) { 
3 5 push( stack, setpair( i+1, j-1)); 

if( param->score_rec->match__threshold <= 

param->score_rec->mat [ idx_seq[i+l] * param- 
>score_rec->alph_len + 

idx_seq[j-m) { 
40 annot(i+l) = • ) » ; 

annot[j-l] = ■ < • ; 

) 

else { 

annot [i + 1] = • . 1 ; 
45 annot[j-l] = 

) , 

mask [ i+1) a 'M' ; 
mask[j-l] = ' M' ; 

if(0)printf( ■ ( ) %d%d\n", j,i ); 

50 ) 

else if < maxpath > 0 ) ( 
k = maxpath; 

push( stack, setpair( k , j-1)); 
push( stack, setpair( i+1 , k+1) ) ; 

55 } 

) 
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thescore = matt matidx{ max_i, max_j , 1, 1) ] ; 
if ( thescore >= param->rain_score J { 

«_hyp++; 

if ( n_hyp <= par am- >max_res ) { 
if(0>printf< '%s\n" ( seq ); 
if(0)printf( °%s\n n , annot ); 
outstr[0J = seq; 
outstrfl] = annot; 

if ( outfile != NULL ) { 
fprintfC outfile, "Score= %d\n" , thescore}; 
print _lines { outfile, outstr, 2); 
} 

> 

) 

free I annot) ; 

annot. = emp ty_s t r. < _ JX )_;__ _ 

) 


free ( mt ) ; 
free ( annot ) ; 
free ( mask) ; 
free__stack( stack); 
return thescore; 


char *nussinov_traceback { FILE * out file, paranutype *param, 
int depth, int 1, int* idx_seq, 
char *seq. int *mat, int *path) { 

stack_type * stack; 

int i, j, k, d; 

pair_type pair; 

char *annot; 

matrix_type *mt; 


int 


gap_scorel , gap_score2 ; 


stack 
mt 

rnt->d 

mt->ll 

mt->12 


= init_stack( STACK_CHUNK) ; 

= calloc( 1, sizeof ( matrix_type) ) ; 
- mat; 

= 1; 
= 1; 


annot = empty_str( 1) ; 

for( d=depth; d<l; d+ + ) { 

push{ stack, setpair( d, d-depth) ) 


while ( stack->sp > 0 ) { 
pair = pop( stack); 
if (0) { printf ( "pop 
i = pair.i; 
j = pair. j; 
if< i > 3 ) i 
if { path[ matidx( i, j. 1 4 
push( stack, setpair( 


) ; print_pair ( pair) ; ) 


1)] == -1 ) { 
i-1, j>); 
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) 

else if( path[ matidxf i, j, 1, 1)] == -2 ) { 
push{ stack, setpairf i, j+1)); 

) 

5 else if( path[ matidx{ i, j, 1, 1) ] == 0 ) { 

if{ param->score_ rec->roatch w threshold <= 

param->score_rec->inat [ idx_seq(i) * par am - 
>score_.rec->alph__len + i6X-.seq[j ] J ) { 
annot [ i ] = 1 ) ' ; 
10 annot ( j] = 1 ( ' ; 

) 

else { 

annot [ i ] = 
annot [ j ] = 

15 ) 

. . ... if (O.).printf (__r._jLJ._J *d _%d\n?-__j , i ) ; . 

push( stack. setpair( i-1, j+1)); 

} 

else < 

20 k = path[ matidxC i, j, 1. 1) ] ; 

push( stack, setpair( k , j)); 
push ( stack, setpair( i , k+1) ) ; 

) 

if (0>{ 

25 if< path[ xnatidx< i-1, j, 1, 1)1 == -1 ) 

gap_s Corel =param->score__rec~>gap_cont ; 
else 

gap__s corel = param- >s core_rec - > gap_s tart ; 
if( path[ maticbc, i. j + 1, 1, 1)3 == -2 ) 
30 gap_score2 = param->score_rec->gap_cont ; 

else 

gap_score2 =param->score_rec->gap_start ; 

if( m( i-1, j f mt) == 
35 m{ i, j, mt) - gap__scorel) { 

if (O)printf ( * mat: -1 %d %d\n" , m{ i-1, j, rot), m ( i, j, 

mt > ) ; 

push{ stack, setpair( i-1, j)); 

} 

40 else iff m( i, j + 1, mt) == 

m( i, j, rot) - gap_score2) { 

if(0)printf< n mat: -2 %d %d\n" , m( i-1, j. mt) , m( i, j, 

mt) ) ; 

push( stack, setpair( i, j+1) ) ; 

45 ) 

else if{ (mat[ matidx( i-1, j+1, I, 1)] + 

param->score_rec->mat [ idx_sec_(i] * param->score_rec 

>alph_len 

+ idx__seq[ j] ] ) 
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mt) ) , 


mat( maticbc. i. j f 1, 1))) { 

if(0)printf( ■ mat: 0 %d %d\n w , m( i-1, mt), m( i, j f 


annot [i] = ' ) ■ ; 
55 annot ( jl = ' ( ' ; 

if(0)printf( D ( ) %d %d\n-, j.i ); 
push( stack, setpair( i-1 , j + 1)); 
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else { 

for( k=j+l; k<i; k++) { 

if(0)printf< - %d %d %d%d\n", i, k, k. j); 
if ( mat[ matidx< i f k+1, 1, 1)) + matt matidx( k, 
5 j, 1, 1)] == 

mat! matidx( i, j, 1, 1>]> { 

i£(D)printf< " mat: %d %d %d\n", k. m( i-1, j, 
mt). ni( i, j, mt) ) ; 

push( stack, setpair{ k , j)); 
10 push( stack, setpair ( i , k+l> ) ; 

break; 

> 

) 

) 


15 } 


25 


) 


fprintf( outfile, "Score as %d\n" , mat [ matidx( d, d~depth, 
20 . 1, 1)]); 

printf< B %d>%s\n", d, seq ); 
printf< u %d>%s\n n f d, annot ); 
free ( annot ) ; 

annot = empty_str( 1) ; 


30 } 


) 

free_stack( stack); 
return annot; 


void nussinov_f ill < param_type *param f int depth, int 1, int 
*idx_seq, 

35 int *mat, int *path, char *seq) { 

int d; 

int i, j, k, maxjpath; 
int score, max_acore; 
int h_size = 3 ; 
40 int gap_score; 

int match_f actor ; 

for( d=0; d<depth; d++) ( 
for( i=d+l; i<l; i++) { 
45 if(0)printf( °%d %d %d\n", d, i, depth); 

j = i - 1 - dr- 

ift path[ matidx( i-1, j, 1, 1)] — -l ) 
gap.score = par am- > score_r ec - >gap cont; 
50 else 

gap_score = par am- > scor e_r ec - >gap_s tart ; 
max_score = mat [ matidx ( i-1, 1, 1)] - 

gap_score ; 

max__path = -1; 

55 

if( path[ matidxt i, 1, 1) ) == -2 ) 

gap_score = param->score_rec->gap_cont ; 
else 
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gap__score = param->score__rec->gap_start; 
score = mat( matidx{ i, j+1. 1, 1)] - gap_score; 
if< score > max_score> { max^score = score; max_jpath 

-2;} 

5 

if < i~3 <= h_size ) { 
score = par am- > score_r cc - > loop_score ; 
} 

else { 

10 if( pathC matidx( i-l, j+1, 1, 1|] == 0 ) 

match_f actor = 1; 
else 

ma tch_f actor = param->score_rec->match_cont_f actor ; 
score = mat [ matidx( i-l, j + 1, 1, 1)J + 
15 match_factor * 

_ param->score_recr>mat [.. idx_seq[i] *-- pajram->score_rec- 

>alph_len + 

idx_seq(j] ] ; 

> 

20 if (O)printf ( "%d %d %d %d %d %d %d\n", i. j, 

idx_seq[i] # idx_seq[j), 

pararn~>score_rec->alph_len, 

param->score_rec->mat [ idx_seq[i] * param->score_rec- 
>alph_len + 

25 idx_seq [j J ] , idx_seq[i] * param->score_rec->alph_len + 

idx_seq[j] ) ; 

if ( score > max_score) { max^score = score; max_path 

0;> 

3 0 for( k=j+l; k<i; k++) { 

score = mat[ matidx ( i, k+1, 1, 1)3 + mat! matidx( k, 
1, 1)3 

- param->score_rec->gap_start ; 
if< score > max^score ) { max_score = score; max_path = k 
35 i£tO)printf< -%d %d %d %d %d\n- , i, k, k, j, score); 

> 

if ( max^score > 0 ) ( 
mat [ matidx { i, j, 1, 1) J = max^score; 
40 ) 

else { 

mat ( matidx ( i, j , 1, 1) ) = 0; 
> 

pathC matidx ( i, j, 1, 1)] = max_path; 
45 if (O)printf ("%c%c %3d %3d score: %3d path: %3d\n" , 

seq[i], seq[j], i, j, max_score, max_ path) ; 
) 

) 

} 

50 

int nussinovl FILE * out file, param_type * par am, char *seq) { 
int 1, i; 
int *mat, *path; 
55 int *idx_seq; 

int score = INT_MAX ; 
char *mask_seq; 
int max_score = 0; 
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int depth = param- >depth; 


10 


55 


1 = strlen( seq) ; 

if( depth > 1-1 ) ( depth = 1-1;) 

mat = calloc( matsize( 1, 1), sizeof ( int)); 

path = calloc( matsize( 1, 1), sizeof ( int)); 

mask^seq = calloc( 1+1, sizeof ( char)); 
strcpy( mask_seq, seq); 


if(0) { 

print_mat< sea, seq, mat) ,- 
printf( "Nn"); 
15 if (0)print_mat ( seq, seq, path) ; 

.... . printf < .J»_\n- > ; - - - — - 

) 

if( 0) { 

20 idx_seq s= seq2idx_seq( seq, param->score_rec) ; 

nussinov_f ill ( param, depth, 1 # idx_seq, mat, path, seq), 
nussinov_traceback ( out file, param, depth, 1, idx_ seq, 
seq, mat, path) ; 
) 

25 else { 

i = 0; 

while ( score > param->min_score && i < param- >max_res ) { 
idx_seq = seq2idx_seq( mask^seq, param- >score_rec) ; 
nussinov_f ill ( param, depth, 1, idx_seq, mat, path, 
30 mask^seq) ; 

score = nussinov_local_traceback { outfile, param, 
depth, 1, 

idx_seq, mask__seq, seq, mat, path) ,- 

i++; 

35 iff score > max_score) max_score = score; 

} 

if ( outfile != NULL ) { 
printf ( -Mask: \n" ) ; 
if( score >= param->min_score ) 
40 print_lines( outfile, &mask_seq, 1) ; 

} 

) 

4 5 free ( mat ) ,- 

free ( path) ; 
free ( idx_seq) ; 
free( mask_seq) ; 
return max_score; 

50 } 


OLIGO 


void calc__olig( int id, char *seq, char *rev_seq, param_type 
*param) { 

int score_self, score_target , score__struc; 
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strcpy( rev_seq, seq) ; 
reverse ( rev_seq) ; 

score_self = align ( NULL, param, seq, rev_seq) ; 
5 lower ( rev_seq) ; 

comp{ rev_seq) ; 
reverse { revise q) ; 

score_target = align { NULL, param, seq, rev_seq) ; 
score_struc = nussinov ( NULL, param, seq) ; 
10 printf ( "%7d %s %7d %7d %7d\n", id, seq, score_self, 

score_target , 

score_struc) ; 

> 

15 

void alloligf PILE *outfile, int oligolen, int oligosample, int 
spike f req, 

param_type *param) ( 
20 char alph[255] = "acgtACGT' ; 

long n; 

char *seq, *rev_seq; 
long j ; 

25 rev_seq = empty_str( oligolen) ; 

param- >min_s core = 2; 
param- >max_res = 1; 

3 0 if( oligosample ) { 

for( j=0; j<oligosample; { 

seq - make.randouunmer ( oligolen, spikefreq, alph) ; 
calc__olig( j, seq, rev_seq, param); 
free ( seq) ; 

35 ) 
} 

else { 

n = n_nmer{ oligolen, alph); 
for( j=0; j<n; { 
40 seq = make_nmer< j, oligolen, alph); 

calc__olig< j, seq, rev_seq, param) ; 
free ( seq) ; 

) 

) 

4 5 free( rev_seq) 

) 


ARGUMENT PARSING AND INITIALIZATION 


void add_seq2seqs ( char *seq, sequences_type *seqs) ( 
int new_size, i; 

if { seqs->nseq < seqs->maxnseq ) { 
55 seqs->seqsl seqs->nseq++ ) = seq; 

} 

else { 

new_size = (seqs->maxnseq + SEQS__CHUNK) * sizeof( char *); 
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if{ eeqs->seqs = realloc ( seqs->seqs, new_si*e) ) ( 
seqs->maxnseq = seqs->maxnseq + SEQS_CHUNK; 
fori i=seqs->nseq; i<seqs->majtnseq; i++) 
seqs->seqs li3 = NULL; 
5 seqs->seqs [ seqs->nseq++) = seq; 

> 

else { 

die ( "Failed to realloc seq array to %d bytes", new_size) 

) 

10 ) 
) 


void read_fasta( FILE * seqfile, sequences_type *seqs, char 
15 *alph) { 

„ int^ _ c; ....... ......... 

char name (2 55]; 

char comment [255] ; 

char s[255]; 

2 0 int seq_f lag = 0 ; 

dynamic_str_type * seq = NULL; 

while ( (c = fgetc( seqfile) ) ! = EOF) { 
if( c == •>•) { 
25 if { seq != NULL && seq->len > 0 ) { 

add_seq2seqs \ seq->s , seqs) ; 
free ( seq) ; 

seq = new_dynamic_str_type< ) ; 
} 

30 fgets( s, 254, seqfile); 

sscan£( s, "%s%s°, name, comment); 
seq_flag = 1; 

) 

else if< seq_flag == 1 ) { 
35 if ( strchr( alph, c) != NULL ) { 

seq = addchar { seq, c) ; 
} 

) 

) 

40 if ( seq != NULL && seq->len > 0 ) 

add_seq2seqs ( seq->s, seqs); 


45 


free( seq) ; 


param__type *init ( ) { 
char a [255]; 

int i, j; 
5 0 param_type *param; 

int *mat ; 
#define ALPHLEN 9 

int mathyp_init [ALPHLEN] [ALPHLEN] = { 

-3, -3, -3, 3, -5, -5, -5, 5, -99}, 
55 { -3. -3, 5, -3, -5, -5, 9. -5, -99}, 

-3, 5, -3, 1, -5, 9, -5, 2, -99), 
3, -3, 1. -3, 5, -5, 2, -5, -99), 
-5, -5, -5, 5, -8, -8, -8, 8, -99), 


35 


( -5, -5, 9, -5, -8. -8. 14, -8, -99>. 

( -5, 9, -5, 2, -8, 14, -8, 4, -99), 

{ 5, -5. 2, -5, 8, -8, 4, -8, -99), 

(-99,-99,-99,-99,-99,-99,-99,-99, -99) 
); 


param = calloc ( 1, sizeof I param_type) ) ; 

/* alph */ 
10 strcpyt a, * acgtACGT- * ) ; 

pairam->alph = calloc< strlen( a), sizeof ( char)); 

strcpy{ param->alph, a); 

/* seqs */ 

15 param->seqs = calloc< 1. sizeof( sequences__type) ) ; 

param- >seqs->nseq =._0 j . . . . . 

param->segs->maxnseq - SEQS_CHUNK; 

param->segs->seqs = calloc ( param->seqs->maxnseq, sizeof ( 

char * ) ) ; 

20 param- > seqs 2 = calloc( 1, sizeof < sequences_type) ) ; 

fori i=0; i<param->seqs->maxnseq; i++) 

param->seqs->seqs[i) - NULL; 
param->seqs2->nseq = 0; 

param->seqs2 ->maxnseq = SEQS__CHUNK ; 
25 param->seqs2->seqs = calloc! param- >seqs2->rnaxnseq, sizeof ( 

char *) ) ; 

f or < i=0,- i<param->seqs2->maxnseq; i++) 
param- >seqs2->seqs(i] = NULL; 

3 0 /* hybridization matrix */ 

mat = calloc ( ALPHLEN * ALPHLEN, sizeof ( int) ) ; 

for< i=0;i<ALPHLEN;i++) { 

for( J =0 ; j < ALPHLEN; j ++ ) { 
35 mat [ i * ALPHLEN+ j ] = mathyp_in.it [ij [j ] ; 

) 

) 


/* defaults */ 
40 param->depth = INT_MAX; 

param- >min__score = 20; 
param- >max_res = INT_MAX ; 


45 param- >score_rec = init_score_ree_type ( 

14, 7, 
-40, 2 r 1, 
8, 3, 1, 

param->alph, mat) ; 

50 

validate_score_rec__type ( param- >score_rec, -200, 200); 


55 


return param; 

) 
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int main (int argc, char* argv[)) { 


10 


15 


20 


25 


30 


35 


40 


45 


50 


55 


int 
char 

param_type 
int 

char 

int 

char* 

char* 

FILE 

int 

int 

int 

int 


i; 

*rev_seq; 
* pa ram; 
alga 


0? 


* options; 
next_option; 
outfn = NULL; 
inf n = NULL; 
♦outfile, *seqfile; 
othermethod = 0; 
oligolen = 7 ; 
oligosample = 0; 
spi kef r eq 0 ; „ 


const char* const short_options 
■ hvruay to : i : 3 :s:z:d:n:p:m:l:e:£: " ; 

const struct option long_options 1 1 

"help", 0, NULL, •h* 

■verbose", 0, NULL, *v' 

"version", 0, NULL, 1 r' 
■ secondary^ t rue ture ■ , 0, NULL, 

■self__anealing* , 0, NULL, 

"hybridization 4 " , 0, NULL, 

n tm", 0, NULL, 'f 

"output", 1, NULL, 

■input", 1, NULL, 1 i 1 

"seq", 1, NULL, * s* 

"seq2", 1, NULL, ' z' 

"depth", 1, NULL, ' d* 

*min_score", 1, NULL, 'n' 

"ma^res", 1, NULL, *p* 

"matrix", 1, NULL, 

"allolig", 1, NULL, '1' 

"sample", 1, NULL, 

"spikefreq", 1, NULL, »£' 

NULL, 0, NULL, 0 
*/ 


= { 

>, 

}, 

>, 
•u' 
■a' 

•y' 

•o 1 

>, 
), 
), 
>, 
>, 
), 

'e* 
>. 
} 


}, 
>. 
>. 


Required at end of 


array 
}; 


/* mtrace ( ) ; debug memory leaks under linux */ 
param = init ( ) ; 
program,name = argv [ 0 ] ; 
do { 

next_option = getopt^long (argc, argv, short_options, 
long_options, NULL); 
switch (next_option) 
{ 

case 'h' : /* -h or — help */ 
usage ( • " ) ; 

case ' i' : /+ -i or --input */ 
infn = optarg; 


seqfile = sfopenC infn, "r-); 

read_f asta ( seqfile, param->seqs, param->alph) 

f close ( seqfile); 

break; 

case • j': /* -j or — input2 */ 
infn = optarg; 

seqfile = sfopen( infn, "r">; 

read_fasta( seqfile, param->seqs2 , param->alph) ; 

f close ( seqfile); 

break; 

case 'o': /* -o or --output */ 
outfn = optarg; 
break ; 

case 'm* : /* -m or — matrix*/ 
options = optarg; 

read_score_rec_type ( options, param->score_rec) ; 
break; 

case 's'i /* -s or --seq*/ 
options = optarg ; 

add L _seq2seqs ( options, param->seqs) ; 
break; 

case «z': /* -z or ~-seq2*/ 
options = optarg; 

add_seq2seqs ( options, param->seqs2> ; 
break ; 

case 'd': /* -d or --depth*/ 
options = optarg; 
param->depth = atoi{ options); 
break ; 

case 'p* : /* -p or --max_res*/ 
options = optarg; 
param- >max_re s = atoi ( options); 
break ; 


case 'n*: /* -n or - -min_score*/ 
options = optarg; 
param->min_score = atoi ( options); 
break; 

case 'v': /* -v or --verbose */ 
verbose += In- 
break; 

case «r': /* -r or --version */ 

printft *dyp version %s\n H , THEVERSION) ; 
break ; 


case 'u 1 : /* -s or - -sec o'ndary_s true ture */ 
algo = algo | S ECO JSTDARY_ STRUCTURE; 
break ; 
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case * a' : /* -s or — self_anealign */ 
algo = algo | SELF_^ANEALING 
break ; 

5 case 'y* : /* -s or — self_anealign */ 

algo = algo | HYBRIDIZATION; 
break; 

case 't': /* -t or — tm */ 
10 algo = algo | TM; 

break; 

case /* -1 or --allolig */ 

options = op tar g; 
15 oligolen = atoi { options); 

othermethod -=.- 1; - 

break; 

case ' e' : /* -p or --sample */ 
20 options = optarg; 

oligosample = atoi < options); 
break; 

case 'f: /* -f or — spikefreq */ 
25 options = optarg; 

spikefreq = atoi ( options) ; 
break; 


30 


35 


50 


case ■ /* The user specified an invalid option. 

usage ("Sorry, one of the options was not 
recogniced" ) ; 

case -1: /* Done with options. */ 
break ; 


default: /* Something else: unexpected, 
abort { ) ; 

} 

) 

40 while <next_option != -1) ; 


if ( outfn == NULL ) 

out file = stdout; 
45 else 

outfile = sfopen< outfn, "w" ) ; 


if< verbose) 

print_score_rec_type ( param->score_rec) ; 


if( othermethod) { 
allolig< outfile, oligolen, oligosample, spikefreq, param) , 

55 else { 

if( !algo ) 

usage ( "please select a method: u, a or y" ) ; 


39 


tori i=0; i<param->seqs->nseq; i++) { 

fprintff outfile, •>sequence_%03d\n" , i+1) ; 
if < algo & SEC ONDARY_S TRUCTURE ) 
nussinov( outfile, param, param->seqs->seqs [ i ] > ; 
5 if ( algo & SELF_ANEALING ) { 

rev_seq = calloc( strlen( parara->seqs->seqs [i ) ) , sizeof< 
char)); 

strcpy( rev_seq, param->seqs->seqs[i] ) ; 
reverse ( rev_seq) ; 
10 if ( param- >min_s core < 2 ) { 

pararo->min_score = 2 ; 

) 

align{ outfile, param, param->seqs->seqs [i] . rev_seq) ; 
free { rev_seq) ; 
15 > 

— if-(-algo &-TM -) -( •- — - 

rev_seq = calloc( strlen( param->seqs->seqs[i] ) ( sizeof ( 
char)); 

strcpy{ rev_seq, param->seqs->seqs [i ) ) ; 
20 lower ( rev_seq) ; 

comp( rev_seq) ; 
if{ param- >min_score < 2 ) { 
param->min_score = 2 ; 

) 

25 align( outfile, param, param- >seqs -> seqs fi] , rev_seq) ; 

free( rev^seq) ; 
) 

if ( algo & HYBRIDIZATION ) { 

if( param->seqs2->seqs [i] == NULL.) 
30 die ( "Sequence 2 is missing"); 

rev_seq = calloc ( strlen( param->seqs2->seqs [i] ) , sizeof ( 
char) ) ; 

strcpy( rev_seq, param->seqs2->seqs [ i) ) ; 
reverse ( rev__seq) ; 
35 align( outfile, param, param- >seqs-> seqs [ i] , rev_seq) ; 

free{ rev_seq) ; 
) 

) 

) 

40 fclose{ outfile) ; 


45 } 


f ree__param_type ( param) ; 
return 0; 
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< PARAMETER> 
oligo_length 
blastdb 
mindist 

ftblast param 
bias t, par am 
bias t param 
blast_param 
blast, param 
blast_param 


30 

/home/probe/database/bf /c_elegans 
30 

-e 50 -F F -a 4 


cfrecj 

cphase 
15 end_spike_len 
dnaconc 

saltconc 

max_noligo 

maxhits 
20 min_score 


wordlen 

strand 

expect 

nproc 

filter 

3 
0 
0 

.2000... 
115 
10 
8 

35 
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both strands 

5 

2 
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patent- oo 


1 9 MAJ 2003 
Modtaget 


#perceptron parameters 


oligod_param 
25 oligocLparam 
o 1 i god_p ar am 
oligod_param 
ol igod_j?aram 
ol i god_jparam 
30 oligod_param 
oligod_jparam 
ol i god__param 
ol igocLparam 
oligod_param 
35 oligod_param 
oligocLparam 
ol i god_param 
o 1 i god__param 

0 1 i gocLpa ram 
40 oligocLparam 

oligod_param 
o 1 i god_pa r am 
oligod_param 
oligod_param 
45 oligod_param 
oligod_param 
oligod_param 
oligooLparam 

01 i god__param 
50 ol i god_jparam 

oligod__param 
oligod_param 
oligod_param 
oligod_param 
55 oligocLparam 
oligod_param 
oligod__param 
oligod_ param 


maxjnatch 

max_match 

niax_match 

max^match 

max__stretch 

max__s tret ch 

max_stretch 

max_st retch 

self_hyp 

self__hyp 

self_hyp 

self_hyp 

self_match 

self_match 

self _match 

self_match 

tm_min 

tm_min 

tm_min 

tm_min 

tm_max 

tm_max 

tnumax 

tm_max 

tin 

tm 

tm 

tm 

titL_dan_min 
tm_dan_m.in 
tm_dan_min 
tm_dan_min 
tm_dan_raax 
tm_dan_max 
t m_da n_jm ax 


cutoff 

squash_dx 

squash__dy 

weight 

cutoff 

squash_dx 

squash_dy 

weight 

cutoff 

scjuash^dx 

s<juash_dy 

weight 

cutoff 

squash_dx 

squash_dy 

weight 

cutoff 
squash_dx 
sguash__dy 
weight 
cutoff 
squash_dx 
sguash_dy 
weight 
cutoff 
squash_dx 
squash_dy 
weight 
cutoff 
squash_dx 
squash_dy 
weight 
cutoff 
squash_dx 
sguash_dy 


30 
5 

0.9 
1 

20 
2 

0,9 
1 

25 
10 
0.9 
1 

50 
10 
0.9 
0 

40 

2 

0.9 
1 

95 
2 

0.9 
1 

1.7 
0.1 
0.9 
5 

50 
2 

0.9 
0 

95 
2 

0.9 
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oligodLparaxn 
o 1 i god^par am 
oligod_param 
o 1 i god_param 
o 1 i god_paraxn 
ol igod_param 
oligod_param 
ol i god__param 
< / PARAMETER> 


tin_dan_max 

tm_dan 

tir\_dan 

tm_dan 

txr\_dan 

eg 

ggg 

hit^score 


weight 

cutoff 

squash_dx 

squash_dy 

weight 

weight 

weight 

weight 
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// $Id; Tnupredictor. jova.v 1.6 2002/10/17 14:12:50 jgk Exp $ 3 


package tmpred; 

5 

import java.io.*,- 
import java.util.*; 
import j ava . lang . • ; 
10 import j ava. text.*; 


15 


20 


25 


// Compile with: 

U /usr/ javal.2/bin/javac -classpath . : /usr/ javal . 2/ 
tmpred/Tm^predictor .-java - --- — 

// GCC compile: 

// gcj — main=tmpred.Hn_pr edict or tmpred/1tn_pr edict or . j ava 

tmpred /Tm_thermodynamic_model Java tmpred /BadOli go Except ion . j ava -a 
Hn_pr edi c t or 

public abstract class lto_jpredictor ( 

public abstract double calcrm< String sequence ) throws 
BadOligoException; 


public abstract String lastSequence ( ) ; 
public abstract Double lastTmO; 
3 0 public abstract String getParameter ( String paramName ); 

public abstract void set Parameter ( String paramName, String value 

); 

35 public abstract Enumeration getParameterNames ( ) ; 

public static String getVersion { ) { return "$Id: Tm^predictor . java, v 
1.6 2002/10/17 14:12:50 jgk Exp ) 

40 /*• Singleton instance. ♦/ 

private static Tfc^predictor instance = null; 

/** Return a singleton instance of the default implementation. 

45 * ©author Kim Haagensen 

*/ 

public static Tfcupredictor getTm__predictor < ) { 
if ( instance == null ) { 

instance = getlnstance ( ) ; 
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> 

return instance; 

) 

5 /** Return an instance of the default implementation. 

*/ 

public static Tm_jpredictor get Instance {) { 
// Default model (for now..) 
return new Tm_thermodynairiic_jnodel <) ; 

10 l 


// For demo purposes 

public static void main (String arg&t] ) { 
Tnupredictor tanmode l=1m_pr edict or .get Instance (} ; ■ 
15 Enumeration pnaroes= tmmodel .get Parameter Names ( ) ; 

String paramNaroe; 

// tmmodel . set Parameter ( - debug" t * true* ) ; 

20 System. out -print In ( "Version: ■+tmxnodel .getVersion( ) ) ; 

System, out . print In < ) ; 

System, out .print In ( "Constants: ■ ) ; 
while (pnames . hasMoreElements ( ) ) { 
2 5 poramName= ( String) pnames . nextElement ( ) ; 


System. out .println (paramName*" =• +txnmodel .getParameter IparamName) ) ; 
) 


30 System, out .print ln() ; 

System . out . print In ( "Resu 1 ts : ■ ) 

System. out -pr in t In ( "Input Sequence^ 0 +args [0] ) ; 


try{ 

35 System. out .println( ■Tm="+tmmodel . calcTm<args [0] ) ) ; 

System, out . print In ( 'Translated sequence: 
•+tmmodel . lastSequence ( ) ) ; 
> 

catch (BadOligoExc option expt) { 
40 System, out .print In ( "Error: "+expt . get Mess age {)) ; 

System. out .print In ( B Tm= *+ tmmodel. lastTm ( ) > ; 

System, out . print In ( "Translated Sequence: 
" -t-tmmodel . lastSequence ( ) ) ; 
) 

45 ) 


. _ - — Patent- og 

pf^U^ 3s Varem»rke«vrelsen 

1 9 MAJ 2003 

1 

Modtaget 


package tmpred; 

5 import j ava . io . * ; 

import java.util.*; 

import java.lang.*; 

import j ava . t ext . * j 

10 class Tnv_param { 

public Hashtable table_deltaS_init , table_deltaH_init, table_H_nn, 
table_S_nn , table_H_mono ; 
>; 


20 


15 class Tnuthermodynamic_model extends Ttaupredictor { 
private Tauparam modparam[J { 


private Hashtable table_LD_translate, table_ws_translate, 

table_ten_translate ; 

private Properties parameter Proper ties ; 

private String sequence; // Last translated sequence 

private Double tin; // Last calculated Tm 

25 // Get methods 

public String 1 est Sequence () { return sequence; > 
public Double lastTm()( return tm; ) 

// Property methods 

3 0 public String get Parameter (String paramName) ( 

return (String) parameter Proper ties .get {paremName) ; 

> 

public void setParameter (String paramName, String value) { 
parameter Proper ties. put (paramName, value) ; 

35 } 

public Enumeration ge t Parameter Names () { 
return parameter Proper ties . proper tyNames ( } ; 

) 

4 0 // Utility functions: 

private double hget (Hashtable h. String key) { 
// get double value from hashtable, zero if it wasn't there. 

if (key!=null) { 
45 Double val= (Double)h . get (key) ; 

if (val null) { 

return val .doubleValue ( ) ; 
) else { 
return 0.0; 
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} 

) 

return 0.0; 

5 ) 

private void hcount (Hash table h, String key) ( 
// Count entry in h(key) up one, if it exists. 

10 if (key«=n_ll) ( 

Double val= (Double ) h. get (key) ; 

if (vali=null>{ 
h.put (key, new Double (val .doubleValue ( ) +1} ) ;- 
15 } else { 

h.put(key r new Doubled)); 

} 


20 


) 

private String trans_LD (String seq) { 
int i; 

String result^"*; 

25 for<i=0; i<seq, length () i i+*5 { 

result=result + 
(String) table_LD_.tr ansla te . get (seq. substring (i, i+1) ) ; 
) 

3 0 return result; 

} 

private String trans_jws ( String seta) ( 
int i; 

3 5 String result=""; 

for{i=0; i<seq. length <) ; i++) { 

result=result + 
(String) table_ws_translate , get < seq . substring (i , i+1) ) ; 
40 ) 

return result; 

} 

4 5 private String findReplace (String inpseg, String find. String 

replace) ( 

String subs eg; 
String seq=inpseq; 
int i; 


3 


£or<i=0; i<seq. length () -find, length O+l; { 
Eubseq=seq. substring (i , i+ find. length () ) ; 
if (subseq. equals {find) ) { 

5 

seq=seq. substring (0, i) +replace+seq. substring (i+find. length < ) ) ; 
> 

) 

10 return seq; 

) 

private void checkSequence { String sequence) throws 

BadO tig oExcept ion { - - - 

15 int goodlength=0; 

int i; 
char xnon; 

String badletters=» " ; 

20 for{i=0; i< sequence. length () ; i++){ 

mon=sequence . charAt ( i } ? 

switch (mon) C 
case ' a ' : 
case ' g ' : 
case ' c ' : 
case ' t ' : 
case 'A* : 
case *G ' : 
case 1 C 1 : 

case • T 1 : goodlength++; break; 
default : 
if <badletters== p *) { 

badletters= " " +mon; 
) else ( 

badletters=badletters+" , "+mon; 

> 

> 

) 

i£ {sequence. length () ! =goodlength) ( 
this. tm=null; 
// this .sequence=" ■ ; 

throw new Bad 01 igoExcepti on ( "Wrong monomer (s) in sequence: 
-+badletters) ; 
> 

) 
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35 


40 


Tm_t:hermodynami c_jnodel { ) ( 


parameter Proper ties=nev Properties 0 ; 

parameterProperties. put < 'debug-, * f alse" ) ; // Set to true if 
debugging 

parameter Proper ties, put ( •oligo_conc- , " 0. 000002" ) ; // =2 

microroolar, this is the total strand concentration, when the two 
complements ore in 1 uM each 

parameterProperties.put< 'salt_conc-, -0-115") // =115mM, Exiqon 

most common concentration 


sequence=null; 
tm=nu 1 1 ; 


// Translate tables: 
- table_LD_translate=new Hash table ( ) ; 
table_LD_translate . put { 9 a " ( * D " ) ; 
tabl e_LD_ trans late. put < m g 9 , *D") ; 
table_LD_translate . put ( ■ c 9 , • D* ) ; 
tabl e_LD_ trans late. put ( "t*, ■ D" ) ; 
table_LD_ translate. put ( "A* , "L"J; 
table_LD_ translate. put ( "G" , -L-); 
table_LD_t ranslat e . put (*C», • L • ) ; 
table_LD_trans late. put ( "T" , *L' ) ; 

table_ws_translate=new Hashtablet) ; 
table_ws_trans late, put { "a", "w") 
table_ws_translate.put( "g* , 's*"); 
table_ws_translate.put< "c* , "s"); 
table_ws_translate.put( "t 11 , "w»> ; 
table_ws_translate.put ( "A" , ■»■) ; 
tab le_ws_trans late, put ("G" , "S" ) ; 
table_ws_trans late, put ( "C" , *s 9 ) ; 
tabl e_ws_trans late. put ( "T - , "W ) ; 

table_t en_t ranslat e=new Hashtable(J ; 

tabl e_ten_tr ans lat e . put ( " aa ■ , " - aa ■ ) ; 

table_t en_t ranslat e. put ( "tt* , --aa") ; 

table_t en_trans late, put ("at" , "-at") ; 

table_ten_ trans late, put <' ta" , ■-ta" ) ; 

table_ten__ trans late, put { "ca" , ■ -ca ■ > ; 

table_ten_.tr ans late, put ( "tg« , "-ca"); 

table_ten_translate.put ("gf , - -gt" > ; 

table_t en_tr ans late, put ( "ac B , "-gt") ; 

tabl e_ten_ trans late. put ("ct\ "-ct" ) ; 

tabl e_t en_ trans late, put ( "ag" , *-ct") j 

table_ten_ trans late, put ( "ga- , "-ga" ) ; 

table_ten_tr ans late, put ( ■ tc' , "-ga" ) ; 

tabl e_ten__t ranslat e. put ( "eg" , ■-eg* ) ; 

tabl e_ten_trans late. put ( *gc" , "-gc") ; 

table_ten_t ranslat e. put ( *gg" , "-gg") ; 


5 


table - .tetv__trans Late .put { "cc" , "-gg") » 
// 

// Model Parameters 
5 // 

modpax am= new lta__parara [ 7 ] ,- 

/ / ======================================================== 

10 // Parameter f il : /home/ jgk/pro j ects/tmpred/param/paraiti. tmp 


modparam [ 0 ) =new Tn\__param ( ) ; 


15 


modparam(0i , table_deltaSj.init=new Hashtable ( > ; 
modpaxamCO] . table_deltaH_init =new Hashtable ( ) ; 
modparam[0) . table_H_nn=new Hashtable () ; 
modparam [ 0 ) . table_S_nn=new Hashtable ( ) ; 
modparam [0] . table_H_mono=new Hashtable ( ) ; 


20 // deltaS_init 

modparam [ 0 ] . table_deltaS_ini t . put ( ' A" 
Double<2. 63578811043473) } ; 

modparam [ 0 ] . table_del taS_ini t . put ( " C " 
4.60854049293735) ) ; 
25 modparam (0) . table_deltaS — init .put ( *G" 

4.56687267801977) ) ; 

modparam(0] . table_deltaS_init .put C'T" 
Double<2. 7258379273728) ) ; 

modparam [ 0 ) . table_del taS__ini t . put ( ■ s ■ 
3 0 modparam[0] . table_delta£_init .put { ■ w" 


Double (- 


Double ( - 


new Double {-2 . B ) ) ; 
new Double (4.1) ) ; 


// deltaH_init 

modparam[0] . table_deltaH_init .put ( ■ A" , 
Doubled. 41884851398012) ) ; 
35 modparamtO] . table_deltaH_init .put i*C a , new 

2.01835247107257)); 

modparamtO) . table_deltaH_init . put ( *G* , new 
0.740703588206669)); 

modparam [0] . table_deltaH_init . put ( *T* , 
40 Doubletl.19531513535407) > j 

modparam [0] . table_deltaH_init .put ( *s • , new Double (0. 1) ) ; 

modparam [ 0] . table_deltaH_ini t .put ("W , new Double (2 . 3 ) ) ; 


Double ( - 


Double (- 


45 


// H_mono 

modparam[0] . table_H_mono.put{ " A- , new Double < -15 . 5177140147424) ) ; 

modparam[0] . table_H_mono . put (" C", new Double < -17 . 840200113071 )) ; 

modparam [ 0} . table_H_mono. put ("C, new Double ( -18 . 2150417538429 )) ; 

modparam[0) . table_H_xnono. put ("T", new Double<-16 .4741049031613) ) ; 

modparam [ 0 ] . table_H_mono . put ( ■ s ■ , new Double (-9.36)); 
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modparam [0] . table_H_mono.put ( "w" , new Double (-7 . 35) ) ; 


10 


15 


20 


25 


30 


35 


40 


// H_nn 
modparam [ 0 
modparam [ 0 
modparam {0 
modparam [0 
modparam {0 
modparam [0 
modparam [ 0 
modparam [ 0 
modparam [0 
modparam [0 
modparam [ 0 
modparam [0 
modparam [0 
modparam [0 
modparam [0 
modparam [0 
modparam [ 0 
modparam [0 
modparam [ 0 
modparam [0 
modparam CO 
modparam [ 0 

// S_nn 
modparam [ 0 
modparam [ 0 
modparam [ 0 
modparam [0 
modparam [ 0 
modparam [ 0 
modparam [ 0 
modparam [ 0 
modparam [ 0 
modparam [ 0 
modparam f 0 
modparam [ 0 
modparam [0 


. table_H_jin . put 
. table __H_nn.put 
. table_H__nn . put 
. table_H_nn . put 
. table_H_nn . put 
. table_H_nn . put 
. t ab 1 e_H_nn . put 
. table_H_nn . put 
. table_H_nn , put 
. table_K.nn . pu t 
. table_H_nn . put 
. table_H_nn . put 
. table_H_nn . put 
. table_H _jin . put 
. table_H_nn . put 
. t ab 1 e__H_nn . pu t 
. table__H_nn . put 
. table_H_nn . put 
. tab 1 e_H_nn . put 
. table_H_rui . put 
. tab le_H _nn . put 
. tab le_H_nn . pu t 


•-aa 
•-at 
'-ca 
'-eg 
'-ct 
•-ga 
'-gc 

■-gg 
'-gt 
•-ta 

!SS". 
•SW 
•Ss" 
•Sw- 
'WS" 
'WW" 
"Ws" 

Ww" 
•sS" 

sW" 

'WS' 

WW" 


.table, 
.table. 
. table, 
. table.. 
. table, 
. table. 
. table. 
. table, 
. table, 
table. 
, table. 
, table, 
.table. 


.S_jnn 
S_nn 
.S_nn, 
S_nn 
,S_nn , 
S_nn, 
.S__nn , 
S__nn . 
S_nn . 
S_nn. 
S__nn . 
S_nn. 


put ( " -aa 
put { "-at 
put ( ■ -ca 
put C " -eg 
put ( ■ -ct 
put ( '-ga 
put { "-gc 
put ( "-gg 
putC-gt 
put (»-ta 
put ( *DL" 
put("LD" 
put ( "LL" 


, new Double (-0.55)); 
, new Double (0. 15 )) ; 
, new Double (-0.145) ) ; 
, new Double(-1.24) ) ; 
f new Double(0.555) ) ,• 
, new Double (0.155) ) ; 
, new Double (-0.44)); 
, new Doubled. 36>) ; 
, new Double ( -0.045) ) ; 
, new Double (0.15)); 

new. Double(-0. 130528857633394) ) ; 

new Double(-l. 0637157585591) ) ; 

new Double (-0.357926846466049) ) ? 

new Double (0. 379281367679472} ) ; 

new Double(-0. 405577627528957) ) ; 

new Double(-0. 726103393389269) > r 

new Doubled. 04679214548348) ) ; 

new Double(0. 0429919627021588) > ; 

new Double{0. 543872586888427) ) ; 

new Double(0 .575455160673995) ) 

new Double(0. 865853023978453) ) 

new Double(0. 789896125879916 ) ) 


, new Double (-22. 2) ) ; 
, new Double (-20.4) ) ; 
, new Double (-22. 7) ) ; 
, new Double(-27.2) ) ; 

new Double (-21) > ; 
, new Double (-22. 2) ) ; 

new Double (-24 . 4} ) ; 
, new Double (-19.9) ) ; 

new Double(-22.4) ) ; 

new Double (-21. 3) ) j 
new Double(-32. 6588469014549) ) ; 
new Double(-31. 5601899937808) ) ; 
new Double(-46. 8237053519369) ) ; 


// 
// 


4 5 /home/ jgk/ pro j ects/ tmpred/param/ 4.4.0.3.48. my homo . ttall3 
modpa ram [ 1 ] =new Tinjpar am ( ) ; 


Parameter fil 


modparam UJ . table_deltaS_init=new Hashtable( ) ; 
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modparam [ 1] . table_del taH_init=new Hashtable() ; 
modparam [ 1] . table_H_nn=new Hash table { ) ; 
modparam 1 1] . table_S_nn=new Hash table ( ) ; 
modparam [1] . table_H_mono=new HashtableO ; 


// deltaS__init 

modparamtl) . table_deltaS_init .put ( "A* , 
0.417636505370175)) ; 

modparam (1) . table_deltaS_init .put ( "C" , 
10 0.336962265530943)) ; 

modparam[l) . table_deltaS__init .put ( "G" , 
Double<0.1587227075G7369) ) ; 

modparam [ 1 ] . table__deltaS_init .put ( m T* , 

0.208276246703806))-; ~- - - — 

15 modparam [1] . table__deltaS_init .put ( "s" , 

0.372560237115056) ) ; 

modparam [1] . table_deltaS_init .put ( "w" , 
0.369231842748097)) ; 

20 // deltaH_init 

modparamtl) . table_deltaHL.init .put < ■ A" , 
DoubletO. 782799015317204) ) ; 

modparamtl] . table_deltaH_init .put < "C" , 
Double (0.505143958976093) ) ; 
25 modparam[l} . table_deltaH_init .put < "G° , 

0.27B435409858967) ) ; 

modparaxnCl) . table_deltaH_init .put ( m T m , 
Double(0. 435388592178298) ) ; 

modparamtl) . table_deltaH_init .put ( *s" , 
3 0 Double (0. 674274914671847) ) ; 

modparamri] . table_deltaH_init .put ( »w" , 
Double (0.737204216419899) ) ; 


Double (- 
Double (- 
new 
Double ( - 
Double ( - 
Double ( - 


Double < - 


35 


40 


45 


// Hjnono 

// H_jin 
modparamtl] 
modparamtl] 
modparam 1 1] 
modparam 1 1 ] 
modparam 1 1 ] 
modparam 1 1 ) 
modparam 1 1 ] 
modparam [ 1 ] 
modparam [ 1 ] 
modparam [ 1 ] 
modparam [ 1 ) 
modparam 1 1 J 
modparamtl] . 


. table_H_nn.put{ ,, -aa ,, , new Double (-7 . 48976721304652 ) 
. table_H_nn.put f-of , new Double (-7 . 32484387195421 ) 
. table_H_nn.put C-ca" , new Double (-7 . 873 01403612122 ) 
. table_H_nn.put ("-eg* , new Double (-8 . 42170498069151) 
. table_H_nn.put C-cf , new Double (-7 . 74B6997 0168917 ) 
.table_H_nn.put ("-ga". new Double (-7 . 76121630019634 ) 
table_H_nn.put ("-go" , new Double (-8 . 5024003 2966233 ) 
table_H_nn.put( ,, -gg-, new Double (-8 . 25010916492258 ) 
table_H_nn.put ( "-gt" , new Double (-7 . 93137471336983 ) 
table_H_nn.put ( ■-ta" , new Double (-7 . 19824578597169 ) 
table_H_nn.put ( *DI#* , new Double ( -7 . 64614418904991) ) 
table_H_nn.put{*LD", new Double (-8 . 33674742394922 ) ) 
table_H_nn.put < "LL" , new Double ( -8 . 48734357858688 ) ) 


); 
) ; 
) ; 
); 
) ; 
); 
); 
); 
); 
>; 

; 
; 
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// S_nn 
modparam [11 
modpar am [ 1 ] 
modparam [ 1 ] 
modpar am [ 1 } 
modparam [ 1} 
modpar am [ 1 ] 
modparam [ 1 ] 
modparam ( 1 ] 
modparam ( 1 ] 
modparam {1] 
modparam { 1 ) 
- -modparam [1] 
modparam [1] 
modparam ( 1 ] 
modparamll] 
xnodparamd J 
modparamll] 
modparam [1] 
modparam I 1 ] 
modparam [1] 
modparam { 1 ) 
modparam [ 1 J 
modparam [ 1 ] 
modparam [1] 
modparam[l] 
modparam(l) 
modparam [1 ) 
modparam [ 1 ) 
modparam [ 1 ) 
modpar am[l ] . 
modparam [1] . 
modparam [1] . 
modparam [1] , 
modparam f 1 ] . 
modparamf 1] - 
modparam II], 
modparam [1] . 
modparam [ I ] . 
modparam [ 1 ] . 
modparam [1} . 
modparam [1] . 
modparam [ 1] . 
modparamCl] . 
modparam [ 1 ] 
modparamfl) 
modparam [ 1 ) 
modparam [1] 


. table_S_nn . pu t ( ■ -aa " 
. table_S_nn . put ( " -at " 
- table_S_nn.put ( t -ca l 
. table_S_nn.put { ■ -eg" 
. table_S_nn . pu t ( * -ct ■ 
. table_5_nn.put ( "-ga" 
. tablets _nn. put ( *-gc' 
. table__S_jin . pu t ( ■ -gg ■ 
. table_S_jrm . pu t ( * -gt " 
. table_S_nn . pu t ( • - ta ■ 
. table_S_nn.put ( "AA" , 
. t abl-e_S_nn . pu t (• " AC ■ 
. table_S_nn .put ( * AG " , 
. table_S_nn.put ( "AT" , 
. table_£_nn.put { "Aa" , 
. table_S_nn.put { "Ac" , 
. toble_S_nn.put ("Ag" , 
. table_S_nn.put("Af , 
. table_S_nn . pu t ( n CA ■ , 
. table_S_nn.put ("CC , 
. table_S_nn .put ("CG", 
. table_S_nn.putt°CT", 
. table_S_nn.put ("Ca" . 
. table_S_nn . pu t ( B Cc ■ , 
. table_S_nn.put ( "Cg- , 
. table_S_nn .put ( - Ct " f 
. table_S_rm . put { "GA" , 
. table_S_nn.put ("GC", 
. table_S_nn.put ("GG' ( 
. table_S_nn . put ( " GT • , 
. table_S_nn . pu t ( " Ga \ 
. table_S_nn . pu t ( " Gc ■ , 
, table__S_nn . put { " Gg ■ , 
. table_S_nn.put ("Gt" , 
. table_S_nn . pu t ( " TA" , 
. table_S_nn.put ( "TC" , 
. table_S_nn .put ( " TG" , 
. table_S_nn .put ( *TT" , 
table_S_nn . pu t ( ■ Ta ■ , 
table_S_nn . put ( ■ Tc " , 
table_S_nn.put ("Tg" ( 
table_S_nn.put ("Tt" , 
table_S_nn.put ("aA" , 
table__S_nn . pu t ( " aC " , 
table_S_nn.put ("aG", 
table_S_nn.put { "aT" , 
table_S_nn . put { h cA ■ , 


, new Double 
, new Double 
, new Double 
, new Double 
r new Double 
, new Double 
, new Double 
, new Double 
, new Double 
, new Double 
new Doublet 
■ new -Double t 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Doublet- 
new Doublet- 
new Double ( 
new Doublet- 
new Doublet- 
new Double (- 
new Double (- 
new Doublet- 
new Double (- 
new Doublet- 
new Double ( - 
new Double (- 
new Double ( - 
new Doublet- 
new Double {- 
new Double (- 
new Double (- 
new Double (- 
new Double (- 


(-21. 0183983393758) > 
(-20.9296841275317)) 
(-20.9035893027225) ) 
(-20.6106325122583) ) 
(-20.9057484730661) ) 
(-20.9212322254303) ) 
(-20.5901247977925) ) 
(-20.7858543446338) ) 
(-20.8689105244148) ) 
(-20.9664930077991) ) 
21.5617306575376) ) ; 
20.37B6877774106)) ; 
-19.7518872341502) ) ; 
-21.2727546056293) ) ; 
-20.938973530993) > ; 
-20.2351610010894) ) ; 
-19.4518499755888) ) ; 
-18.0784147699652) ); 
-20.7498315076301) ) ; 
-18.5922344843667) ) ; 
-18.0631881798201) ) ; 
-20.9602092321618) ); 
-19.34976501959)); 
-20.0636169441132) ); 
-18.939750726347)}; 
-21.6068106111011) ) ; 
-20.6285173614998) ); 
-21.1393042416962) ) ; 
-19.2292577000144) ); 
-20.5200676805333) ) ; 
•18.7514389680752) ) ; 
•18 .6326804946465) ); 
-19.4129295709103) ) ; 
-19.7139582131827)); 
-21.7225383328128) ) ; 
■19.016695S249448) ) ; 
-18.1432831834084) ) ; 
•21.702655506544) ) ; 
■21.6061975277002)) ; 
21.5479504447966)); 
20.6436696433912)); 
21. 6266985715776) ) ; 
20.5319311697475) ) ; 
19.025255715981) ) ; 
20.3463306063697) ) ; 
20.3507644210827) ) ; 
18.6568044948628) ) ; 
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modparam[l] . table_S_nn .put ( "cC , new Double(-20. 8609785312075) ) ; 
modparam [1] . table_s_nn.put <• cG' , new Doublet -19 . 1534484621275 )) j 
modparamfl] . table_S_nn .put < •cT n , new Double ( -18 . 949763882851) ) ; 
modparamll] . t ab 1 e_S_nn .put < "g A B , new Double ( -19 . 9252395614033 )} ; 
modparam[l] . table_S_nn .put ( "gC" , new Double {-19 . 5747039562B59) ) ; 
modparam[l] - table__£_nn .put ( "gO" , new Double (-19 .4942636766488 ) J ; 
modparam[l] - table_S_nn.put ( B gT" , new Double ( -19 . 9443842009211 >) ; 
modparamll] - table^S^nn .put ( " tA* , new Double ( -22 . 5442702941593 )) 
modparamll] . table_S_nn .put ("tC", new Double ( -19 . 7 656230175562 )) ; 
modparamll] .tab le_S_nn, put CtC, new Double (-20 . 6803075424075) } ; 
modparamll] . table_S_nn .put rtT", new Double (-20 . 9918041722088 ) ) ; 
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modparam [ 2 ] =aew Tm_jparain ( ) ; 

modparam [ 2 ] . table_deltaS_init=new HashtableO; 
modparam[2 3 . table_deltaH_init=new HashtableO ; 
modparam [2] . table_H_nn=new Hashtable ( ) ; 
modparam [ 2 ] . table_Sjm=new HashtableO ; 
modparam [2] . table_H_mono=new HashtableO ; 

// deltaS_init 

modparam [2 J . table_deltaS_init .put ( "A" , 
0.407793407213917) ) ; 

modparam [2 J . table_deltaS_init .put ( "C" , 
0.213527372713559) ) ; 

modparam [2 J . table_deltaS_init .put ( 'G n , 
Double(0. 35848328509707) ) ; 

modparam (2] . table_deltaS_init .put ( •T" , 
Double(0.2039440B6326224) ) ; 

modparam(2] . table_deltaS_init . put Cb", 
0.372560237115OS6) ) ; 

modparam[2] . table_deltaS_init .put { "w" , 
0.369231842748097) ) ; 

// deltaH_init 

modparamC 2 ) . table_deltaH_init . put ( *A" , 
Double(0. 486242607856365) ) ; 

modparam 12) . table_del taH_ini t . pu t ( 'C ■ , 
0.0536311403356718) ) ; 

modparam[2) . table_deltaH_init .put < m G 9 . 
0.787818472323185) ) ; 

modparam(2] . table_deltaH_init . put ( ■T* , : 
0.577191417685245) ) ; 


Double ( - 
Double ( - 
new 
new 
Double ( - 
Double (- 

new 
Double f- 
Double {- 
Doublet- 
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modparam[2) . table_deltaH_init .put ( " s" , new 
Double<0. 674274914671847} ) ; 

modparam [2] . table_deltaH_init .put ( *w" , new 
Double(0. 737204216419899) > ; 


// H_ mono 


// H_nn 

modparam [ 2 ] . table_H_nn . put < " -aa ■ 
xnodparam[2] . table_H__nn .put ( '-at ' 
modparam [2 ] . table_H_nn . put ( " -ca " 
modpar amt 2] . table_H_nn . put ( "-eg" 
modpar ami 2 ] . table_H_nn . put ( • -ct " 
modparam [ 2 ] . table_H_nn . put ( " -ga rt 
modparam [ 2 ] . table_H_nn . put ( ■ -gc " 
modpar am [ 2 ] . table_H_nn .put ( " -gg" 
modparam[2] . table_H_nn . put ( ■ -gt " 
modparam [ 2 ] . table_K_nn . pu t ( " - ta " 
modparam 12]. tab le_H_nn . pu t ( " DL* , 
modparam [2] . table_H_nn.put ( *LD - , 
modparam[ 2] . table__H_nn. put ( *LL" , 

// S_nn 

modparam [2] . table_S_nn .put ( ' -aa p 
modparam ( 2 J . table_S_nn.put ( "-at* 
modparam [2 J . table_S_nn.put ('-ca' 
modparam [2] . table__S_ J nn.put { ■ -eg" 
modparam [2) . table_S_nn. put (» -cf 
modparam [2] . t able.Sjin . put ( ■ - ga ■ 
modparam [ 2 ] . table_S_nn . put { ■ -gc ■ 
modparam (2] . t able_S_nn .put C-gg n 
modparam [2] . table_S_nn.put ( "-gt* 
modparam [2) . table_S_nn.put { "-ta" 
modpar am[2) . table_S_nn - put { "AA* t 
modparam [2) . table_S_nn.put ( "AC" , 
modparam [2] . table_S_nn.put ( " AG ■ , 
modparam [ 2 ) . table_S_nn . put ( "AT' , 
modparam [2] . t able_S__nn . put ( h Aa" , 
modparam [2] . t able__S _nn , put ("AC, 
modparam [ 2 ] . table._S.jnn . put ( " Ag ■ , 
modparam [ 2 ] . t able_S_nn . put ("Af, 
modparam [2] . table_S_nn.put ( *CA" , 
modparam [ 2 ] . table_S_nn . pu t ( ■ CC " , 
modparam{2] . table_S_nn .put < "CG" , 
modparam [ 2 ] . table_£_nn . pu t ( • CT • , 
modparam [2 J . tabl e_S_nn. put ( "Ca w , 
modparam [2] . table_S_nn. put ( *Cc" , 
modparam [2] . table_S_nn.put ( "Cg" , 
modparam (2) . table_S_nn.put ( "Cf , 


. new Double(-7. 48976721304652) ) ; 
, new Double(-7. 32484387195421) ) ; 
, new Double(-7. 87301403612122) ) ; 
, new Double (-8. 42170498069151) ) ; 
, new Double(-7. 74869970168917) ) ; 
, new Double (-7-. 76121630019634) ) ; 
, new Double {-8. 502 400329 66233 )) ; 
, new Double < -8.25010916492258) ) ; 
, new Double{-7. 93137471336983} ) ; 
, new Doublet -7. 19824578597169} ) ; 

new Double(-7 . 57S00309750847) ) ; 

new Double(-8. 62638977709659) ) ; 

new Double(-8. 3097788703 8586)); 


. new Double<-21. 0183983393758) ) ; 
, new Double<-20. 9296841275317) ) ; 
. new Double<-20. 9035893027225) ) j 
, new Double{-20. 6106325122583) ) ? 
, new Double {-20. 9057484730661) ) } 
. new Double (-20. 9212322254303) ) ; 
, new Double {-20. 5901247977925) ) ; 
■ new Double<-20. 7858543446338) ) ; 
, new Double {-20. 8689105244148) ) ; 
, new Double<-20. 9664930077991) ) ; 

new Double(-20. 9447059251337) ) ; 

new Double(-21. 3977869950092) ) ; 

new Double(-19. 3180598469252) ) ; 

new Double(-21. 7372825480374) ) ; 

new Double(-20. 519746311685) ) ; 

new Double{-20. 1102797514339) ) ; 

new Double(-19. 8837896896234) ) ; 

new Double<-16. 6189306374549) ) ; 

new Double(-21. 7677432772647) ) ; 

new Double<-18. 6955344509688) ) ; 

new Double(-17 .7951906152784) ) ; 

new Double(-20. 9908366322904) ) ; 

new Double(-19. 5503830674945) ) ; 

new Double(-20. 3830477876892) ) ; 

new Double (-18 .9002469686417) ) ; 

new Double(-21. 408268210924) ) ; 
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modparam[2] 
modparam [ 2 ] 
modpar am [ 2 ) 
modparam [ 2 J 
modparam [ 2 ] 
modpar dm [2] 
modparam [ 2 ) 
modparam [2] 
modparam [2 J 
modparam [2] 
modparam [2] 
modparam [2] 
modpar ami 2] 
-modpar am [ 2-)- 
modparam[2] 
modparam [ 2 ] . 
modparam [ 2 ] 
modparam [ 2 ] . 
modparam [ 2 ] . 
modparam [2] . 
modparam [2] . 
modparam [ 2 ] . 
modparam [ 2 ] . 
modparam [ 2 ] , 
modparam [2 ] . 
modparam [2] . 
modparam[2] . 
modparam [ 2 ) . 
modparam [2] . 
modparam [2 ] . 
modparam [ 2 ) . 
modparam [ 2 ] . 


. table_S„nn 
. table_S_ - nn 
, table_S_nn 
. table_S_nn 
. table_S_nn 
, table__S_nn 
, tablets __nn 
table_S_nn 
table_S_nn 
, table_S_nn 
table_S_nn 
table_S_im 
table_ S_nn 
-table^S^nn 
table_S_nn 
table_S_rin 
table_S_nn 
table_S_nn 
table_S_nn 
table_S_nn 
table_S_nn , 
table__S_nn. 
table_S_nn . 
table_S_nn , 
table_S_nn , 
table_S_nn, 
table_S_nn, 
table_S_nn. 
table_S_nn. 
table_S_nn . 
table_S__nn . 
table_S_nn 


.put("GA* , 
.putCGC", 
.putCGG' , 
.put(»Gr , 
.put < "Go- , 
.putCGc" , 
■ putCGg", 
.putt-Gf , 
.putfTA" , 
.put("TC" , 
.put("TG" , 
-put ( "TT" , 
.put { "Ta" , 

.put(-Tg", 

.put (•Tt", 
.put ( "aA", 
.put ("aC", 
.put { "aG" , 
put("aT", 
.put ( "cA 1 # 
put("cC", 
put ( "cG" , 
put ( "cT* i 
put{"gA', 
putfgC", 
putCgG', 

Put < -g T « , 
pUt("tA' f 

putCtC", 
put<"tG", 
put ("tT*, 


new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
-new- Doubled 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 
new Double 


-20.8162756238568) ) ; 
-20.6272454690228)) ; 
-18.323886570111) > ; 
-20.3270306928687) ) ; 
-1B.BB40728478426) ) ; 
-19.469767014314) ) ,- 
-20.5553996930734) ) 
-19.0425869495101) ) 
-22.4469367362978) ) 
-17.8972135549923)) 
-18.5655748363087) ) ; 
-21.8796937437075) ) > 
-22.7345090588341) ) 
-20-.-374545599599BJ-)-; 
-20.7769612524783) ) 
-21.4720373674347)) 
-20.5263443813538) ) 
-19.0058736831775) ) 
-20.642144617583) ) ; 
-20.3395587841216) ) 
-18.9416426287649)) 
-20.4556223524877)) 
-19,4702848936854)) 
-18.7713856854974) ) 
-20.6980212771351)) ; 
-19.6358318414429)) 
-20.2590169785449) ) 
-19.646O069771352)) 
-19.56763639202) ) ; 
-19.4028441502224) ) ; 
-21.5258339764989) ) ; 
-21.9790781626011) > ; 


W ParameterfiL 
/home/jgk/projects/tmpred/param/4 .4 . 4. 12 .12 .myhomo. ttall5 

modpar am( 3] =new Tm_param( ) ? 

40 

modparam(3 ] . table_deltaS_init=new Hashtable ( ) ; 
modparam (3] . table_deltaH_init=new Hashtable () 
modparam [ 3 ] . table_H_nn=new Hashtable ( ) ; 
modparam [3 J . table_S_nn=new Hashtable ( ) ; 
4 5 modparam [ 3 ] . table_H_mono=new Hashtable ( ) ; 

// deltaS_init 

modparam[3J . table_deltaS__init .put ( "A" , new Double <- 

5.14449897466861) ) 
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modparam [ 3 ] . table_deltaS_init .put ( "C" , 
G. 98913655256153] ) ; 

modparam [ 3 ] . table_deltaS_init .put ( "G" , 
5.36464180365539)); 

modparam (3 1 . table_deltaS_init .put ( "T" , 
4.19684771688626)) ; 

modparam [3 ) . table_deltaS_init .put {'s' ( 
0.372560237115056)); 

modparam [3] . table_deltaS_init .put ("w" f 
0. 369231842748097) ) ; 


new 


Double (- 
Double { - 
Doublet- 
Double (- 
Double ( - 
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// deltaH_init 

modparam(3] . table_deltaH_init .put ( ■ A* , new Doublet- 

0.30885006-1360838)) ... - - - 

modparam(3 ] . table_deltaK_init .put ( "C" , new Doublet- 

0.972721500770524)) ? 

modparam [3 ] . table_deltaH_init . put t *G" , new Double ( - 

0.856281675074907)] ; 

modparam [ 3 ] . table_deltaH_ini t . put { ■ T" , new Double ( - 

0.254870883861951)) ,- 

modparam [ 3 ] . table__deltaH_init .put { ■ s* , new 
Double (0. 674274914671847 ) ) ; 

modparam [3 J . table_deltaH_init .put ( "w" , new 
DoubletO. 737204216419899) ) ; 

// Hjoono 

modparam [ 3 1 . table_H — mono . put ( ■ A* , new Double <-13 . 0850625834165 )) ; 

modparam [3 3 .table_H_mono. putt - C" , new Double (-15 . 441S946710241) ) ; 

modparam[3] . table_H__mono .put t "G" , new Double (-15 . 5290562655932) ) ; 

modparamf3] .table_H_mono.put( p T n , new Double ( -12 . 09 62280546374) ) ; 


// H_nn 
modparam [ 3 ) , 
modparam [3] 
modparam [ 3 ] 
modparam [3] 
modparam [3] 
modparam [ 3 ) 
modparam [ 3 ] 
modparam [ 3 ] 
modparam [ 3 1 
modparam ( 3 ] 
modparam [ 3 ) 
modparam [3 J 
modparam [ 3 ) 
modparam [ 3 ] 
modparam [ 3 ) 
modparam 1 3 ] 
modparam[3) 


table_H_nn, 
table_H_nn. 
table_H_nn. 
table_H_nn. 
table__H_nn . 
table_H_nn . 
table_H_nn , 
table_H_nn , 
table_K_nn 
table_H_nn , 
table_H_nn . 
table_H_nn 
table_H_nn 
table_H_nn 
table_H_nn , 
table_H_nn 
table__H_nn 


put t " -aa 
put ("-at 
put (°-ca 
put ("-eg 
put P-ct 
put ( • -ga 
put t"-gc 
put("-gg 
putf-gt 
put('-ta 
put CSS" 
put t'sw- 
putCSs" 
put ("Sw" 

putrws" 
put cww* 

putCWs" 


, new Double(-7. 48976721304652) ) 
, new Double t -7. 32484387195421) ) 
, new Double (-7. 873 01403612122 ) ) 
, new Double (-8. 42170498069151) ) 
, new Doublet-7. 74869970168917) ) 
, new Double(-7. 76121630019634)) 
, new Doublel-8. 50240032966233) ) 
, new Doublef-8. 25010916492258)); 
, new Double(-7. 93137471336983) ) ; 
, new Double(-7. 19824578597169) ) ; 

new Doublet -4. 942117 391834 54) } ; 

new Double(-3. 2298329995419)) ; 

new Double(-2. 65193801394363) ); 

new Double (-1. 92736294288536) ) ; 

new Double (-3 .74842 584308 002) ) ; 

new Double(0. 250976824915542) ) ; 

new Double(-2 -52230884729791) ) ; 


13 


10 


15 


20 
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modparam [3 
modparam [3 
mo dp ax am [ 3 
mo dp a ram [ 3 
modparam [ 3 


.table^H_nn.put("WW. new Double(-2 . 47894638749835) ) ; 

.table_H_nn.put("sS-, new Double(-2 . 08573406157705) ) ; 

,table_H_nn.putt n sW , new Double(-3 . 16193826406854) ) ; 

.table..H_nn.put CwS - , new Double(-2 . 10629510531988) ) ; 

.table_H_nn.put CwW f new Double ( -2 . 34429223089846) ) ; 


/ / S nn 




moaporeuu l J J 

. table_S_nn.put { ■ 

— a a 

mof^nnr^JiTn f 7 1 

iuoap&x CUU [ J J 

. table. 

JS^ nn.putt" 

— j=» t- 

fnn^^n t* Am r 1 1 

naupuaiii l 3 j 

. table. 

_S_nn , put ( n 



. table. 

_S_nn . put ( " 

-eg 

modpar am ( 3 ] 

. table_S_nn . put ( " 

— c t 

modparam [ 3 ] 

. table. 

_S_nn.putC 

-go 

modparam [3 ] 

.table. 

_S.jin.putC 

-9C 

modparam 13] 

. table. 

_S_nn.put C 

-gg 

modparam [ 3 ] 

. table_S_nn . put ( • 

-gt 

modparam [ 3 ] 

. table_S_nn . put ( " 

-ta 

modparam [3 ] 

. table. 

_S_nn.put (" 

ss- 

modparam [ 3 ] 

. table. 

.S_nn.put ( ■ 

sw* 

modparam [3 ; 

. tab 1 e_S_rm . pu t ( " 

SS' 

modparam [3] 

. table_S_nn .put ( " 

SW 

modparam [3] 

. table_S_nn.put<" 

ws- 

modpar am[3] 

. table. 

.S_nn.put{" 

WW" 

modparam [3] 

.table. 

_S_nn . pu t ( " 

Ws" 

modparam [ 3 ] 

. table. 

_S.jnn.put ( " 

ww" 

modparam [ 3 ] 

. table. 

.S__nn.put< " 

sS- 

modparam 1 3 ] 

. table_S_nn . put i 9 

sW 

modparam [3 ! 

. table_S_nn . put ( • 

wS- 

modparam [3 ] 

- table_S_nn . put ( ■ 

wW" 


new Double 
new Double 
new Double 
new Double 
new Double 
, new Double 
, new Double 
, new Double 
, new Double 
, new Double 
new Double ( 
new Double ( 
new Double ( 
new Double ( 
new Doublet 
new Doublet 
new Double ( 
new Doublet 
new Doublet 
new Doublet 
new Doublet 
new Double ( 


{ — 21 .0183983393758) } ; 
(-20.9296841275317)) ; 
(-20 .9035893027225) ) ; 
(-20.6106325122583)) ; 
(-20.9057484730661)); 
(-20.9212322254303) ) 
(-20.5901247977925) ) 
1-20.7858543446338) > 
(-20. 8689105244148) ) 
(-20.9664930077991) ) 
-52.7341097218716) ) ; 
-48.0173532462563)) ; 
-24-7053248747157) ) ; 
-25.2483086657963) ) ; 
-46 .7082776752559) ) ; 
-33-976624.2546579) ) ; 
-24.593626756038) ) ; 
-24.1779478033306) ) ; 
-24.13440O0692216) ) ; 
-23.8911849011862)) ; 
-24.5041713703224)); 
23.8041757177075)) ; 


35 


// ================================================= 

// 

/home/jgk/projects/tmpred/param/4 .4.4. 12 .3 .my homo . ttal!6 
modparam [ 4 ] =new Tnv_param ( ) ; 


Parameter fil 


40 


modparam[ 4] . table^deltaS_initanew Hashtable() ; 
modparem[ 4) . table_deltaH_init=new Hasht ablet J ; 
modparam [ 4 ] . table.. H_nn=new Hash table { ) 
modparam[4] . table__S__nn=new HashtableC) ; 
modparam [ 4 1 . t ab 1 e_H_mo no-new Hash table ( ) ; 


// deltaS_init 
45 modparam[4] . table_deltaS_ini t .put ( ■ A" , 

1.80759598042465) ) ; 

modparam [4] . table_deltaS_init .put ( m C* , 
1.64205431364765) ) ; 


new 


Doublet 
Doublet 
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modparam [4 ) . table„deltaS„init .put { *G" , 
1.53474825333411) ) ; 

modpar ara[ 4) . table_deltaS_init .put [ "T" , 
1.56722518744529)) ; 
5 modparam ( 4 J . table_deltaS_init .put { " s ■ , 

0.3725S0237115056) ) ; 

modpar ami 4) - table_deltaS_init .put < *w" , 
0.369231842748097) ) ; 

10 // deltaH^init 

raodpar am [ 4 1 . t able_.de It aH_i nit . put < "A" , 
Double(0. 562011155975154) ) ; 

modparam [4) . table_deltaH_init .put ( *C* , 

Double(0.S0l526805191486)) ; 

15 modparam [4] . table_deltaH_init .put ( "G" , 

0.223329383181387)); 

modparam [4] . table__deltaH_init .put ( *T' , 
DoublelO. 404061229265622) ) ; 

modpar am [ 4 ) . table_deltaH_init .put ( p s* , 
20 DoublelO. 674274914671847) ); 

modparam[4) . table_deltaH_init .put { "w" , 
Double<0. 737204216419899) ) ; 


new 


Double <- 
Double (- 
Double ( - 
Doublet- 


Double ( - 


new 


25 


30 


35 


40 


45 


/ / Hjnono 

modparaml4) . tabl e_H_mono . put ( m A # , new Double ( -17 . 5329265252141) ) ; 

modparajn [ 4 ] • table_H_mono . put ( *C" , new Double (-18 . 5157490113392 )) ; 

modpar am[ 4] .table_H_mono.put("G , \ new Double (-18 . 8714990370891) ) ; 

Biodparam[4] .table__H _mono.put ( "T p , new Double (-17 . 43 0598429616) ) ; 


// H__nn 
modparam ( 4 ] 
modparam (4] 
modpar am( 4] 
modparam (4 J 
modparam f 4) 
modparam I 4) 
modparam [ 4 ] 
modparam [ 4 J 
modparam [ 4 ] 
modparam 1 4 ] 
modparam [ 4 ) 
modparam [ 4 ) 
modparam [ 4 ) 
modparam [ 4 ] 
modparam [ 4 ] 
modparam [4] 
modparam [ 4 J 
modparam [4] 
modpar am f 4 ) 


table_H_nn. 

table„H_nn. 

table_K_rui . 

table_H_nn . 

table_H_nn. 

table_H_jin. 

table_H_nn . 

table_H_nn . 

table_H_nn. 

tabIe_H_nn . 

table_H_nn. 
. table_H_nn . 

table_H_nn . 
, table_H_nn . 

table_H _nn 

table_H_nn 
, table_H_nn 

table_H_nn 

table_H_nn 


put C-aa 
put C°-at 
put (*-ca 
put ("-eg 
putf~ct 
put (• -ga 
put C-gc 
putC-gg 
putc-gt 
put(«-ta 

put ess- 
put CSW" 

put ( -SS" 
.put { "Sw" 
.put CWS- 
put ( ■ WW" 
.put ("Ws" 
.putCWW 
putCsS" 


new Double<-7. 48976721304652) 
new Double (-7.32484387195421) 
new Double<-7. 87301403612122) 
new Double(-8. 42170498069151) 
new Double(-7. 74869970168917) 
. new Double 1-7.76121630019634) 
new Double (-8. 50240032966233) 
, new Double(-8. 25010916492258) 
, new Double (-7. 93137471336983) 
, new Double (-7 .1982457 B597169) 
new Double(-0. 273099798601551) 
new DOuble(0. 419286733365625) ) 
new DOuble(-0. 541700784403176) 
new Double(0. 350546367435068) ) 
new Double(-0. 296817518810437) 
new Double(0. 175879861412384)) 
new Double(0. 328320257160163)) 
new Double(-0. 169293035915045) 
new DoubleiO. 326542458181525) ) 


) ; 
); 
) : 
>; 
) ; 
); 
),- 
); 
); 
); 
); 
# 

) ? 
) ; 


)i 
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modparam[4] . table_H_nn.put ( -sW / new Double ( -0 . 938829019706826) ) ; 
modparai»(4] . table_H_nn.put ( -WS" , new Double(0 . 300181620507486) ) j 
modparam[4) .table.JOin.put ("wW", new Double (0 . 316641864778047) ) ; 

5 // S_nn 

modparam(4J .table_S_nn.put {"-aa - . new Double ( -21 . 0183983393758 )) ; 

modparam[4] . table_S_nn.put ( '-at" , new Double ( -20 . 9296841275317) ) ; 

modparam[4] . table_S_nn.put C-ca', new Double (-20 . 9035893027225 )) ; 

modparam [4] . table_S_nn .put { "-eg* , new Double (-20 . 610632 5122583 )) ; 
10 modperaml4] . tablets _nn. put C-ct* , new Double (-20 . 90574847 30661) ) 

modparam[4] - table_S_nn.put { "-ga" , new Double ( -20 . 9212322254303 )) ; 

modparam (4] . table_S_nn.put C-gc* , new Double (-20 . 5901247977925 )) ; 

modparam[4] . table_S_nn .put C-gg", new Double (-20 . 785B543446338 )) ; 

. • modparam{ 4] . tablets _nn. put (-^-gt» , new Double (-20 . 8689105244148 )-) • 

15 modparam(4] . table_S_nn .put ( *-ta" , new Double (-20 . 9664930077991) ) ; 

modparam [ 4 ] . tatole_S_nn . put ( ■ DL ■ , new Double (-22 . 2388993255255) ) ; 

modparam [4] . table_S_nn .put ( ' LD" , new Double ( -23 . 933695443 67 59 )) ; 

modparam [4] . table_S_nn .put ( "LL- , new Double (-48 .2573001896195) ) ; 

20 

Parameter fil: 

/home/jgk/projects/tmpred/param/4 . 4 . 4 . 48 . 3 .myhomo . ttall3 

2 5 modparam ( 5 ) =new Tm_param ( ) ; 

modparam [5] . table_deltaS_init=new HashtableO ; 
modparam[5] . table_deltaH_init=new HashtableO ; 
modparam { 5 J . table_H_nn=new Hash table ( ) ; 

3 0 modparam [5) . t abl e_S_nn=new Hash t abl e ( ) ; 

modparam [ 5 ) . table_H_mono=new Hash table ( ) ; 

// deltaS_init 

modparam(5] . table_deltaS_init .put ( "A" , 
35 0. 888469343292043) ) ; 

modparam[5] . table_deltaS_init .put( "C" , 
1.02543607545834) ) ; 

modparam( 5) . table__deltaS_init .put < , 
0.541648874710394)); 

4 0 modparam [5] . table_deltaS_init .put ( "T" , 

0.65983661685686) ) ; 

modparam [ 5 J . table_del taS_ini t.putCs', 
0.372560237115056)); 

modparam [5] . table_deltaS_init .put < "w" , 
45 0.369231842748097)); 


new Double (- 

new Double (- 

new Double (- 

new Doublet- 

new Double (- 

new Double (- 


// deltaH_init 

modparam[5) . table_deltaH_init .put ( " A" , 
Double(0. 615705041687039) ) ; 


new 


16 


10 


modparam [ 5 ) . table.. deltaH_init .put { "C" , 
Double(0. 174168418382924) } ; 

modpar ara[5] . toble_deltaH^init .put I "G" , 
0.316854740516605) ) ; 

modparam [5] . table_deltaH w init .put { "T 1 , 
Double<0. 468853851084089) ) ; 

modparam[5J . table_deltaH_init .put ( *e* , 
Double (0.674274914671847) ) ; 

modpar am [ 5 ) . table_deltaH_init .put ( "w" , 
Double(0. 737204216419899) ) ; 


new 
Double { - 
new 
new 
new 


15 


20 


25 


30 


35 


40 


45 


// H_mono 
modparam [5 
modparam [5 
modparam 1 5 
modpar am [ 5 

// H_nn 
modparam[5 
modpar am [ 5 
modpar am [ 5 
modparam[5 
modparam ( 5 
mo dp a ram ( 5 
modparam [ 5 
modparam [ 5 
modpar am [S 
modparam ( 5 
modparam ( 5 
modparam [ 5 
modparam [ 5 
modpar am 15 
modparam( 5 
modparam [5 
modparam [ 5 
modparam [ 5 
modpar am (5 
modparam ( 5 
modpar am {5 
modparam [ 5 
modparam [ 5 
modparam [5 
modparam [5 
modparam 1 5 
modparam {5 
modparam [ 5 
modparam [ 5 
modparam [5 
modparamf 5 


.table_H - mono.put<-A". new Double{-12. 2836541825738) ) ; 

.table__H_mono.put<'C' i new Double ( -12 . 8366937840179 )-) ; 

. table_H_jr>ono . put l»G" , new Double (-13 . 1042874S7 5601) ) ? 

. tabl e__H_jnono . put { " T " , new Double (-12 . 1930059340835 )) ; 


. table_H_nn . put 
. table_H_nn . put 
. table_H_nn . put 
. table_H_nn . put 
. table__H__nn . put 
. table_H__nn . put 
. table_H_nn . put 
. table_H_nn . pu t 
. table_H_nn . put 
. table_H„nn . pu t 
. table_H_nn . put 
. table_H_nn . pu t 
. table_H_nn . put 
. table_H__nn . pu t 
. table_H_nn . put 
. cable_H_nn . put 
. table_H_nn . pu t 
. table_H_nn . pu t 
. table_H_nn .put 
. table_H_nn .put 
. table JK_nn .put 
. table__H_nn . put 
. table_K_nn . pu t 
. cable_H_nn .put 
. table_H_nn . pu t 
. t able_H_nn . pu t 
. table_H__nn .put 
. table_H_nn . put 
. table_H_nn . put 
. table_H__nn . put 
. table_H_nn .put 


"-aa 

"-at 
*-ca 

•-eg 
"-ct 
•-ga 
--gc 
"-99 
"-gt 
B -ta 

"AA" 

"AC* 
"AG* 
•AT" 
-Aa* 
•Ac* 
"Ag* 
•At' 
•CA* 
•CC" 
•CG H 
"CT" 
*Ca r 
•Cc" 
"Cg" 
•Cf 
"GA" 
*GC" 
"GG M 
"GT" 
"Ga" 


new Double (-7. 48976721304652) ); 
new Double (-7. 32484387195421) >/ 
, new Double (-7. 87301403612122) ) ; 
, new Double<-8. 42170498069151) ) 
, new Double(-7. 74869970168917 ) ) 
, new Double(-7. 76121630019634) ) 
, new Double (-8. 50240032966233 ) ) 
, new Double<-8. 25010916492258) ) 
, new Double (-7 .93137 471336983) ) ; 
, new Double<-7. 19824578597169) ) ; 

new Double(-0. 178443264646713) ) ; 

new Double<-0. 626677714801197) ) ,- 

new Double(-0. 596829770378585) ) ; 

new Doublel-0. 487815178989618)); 

new Double(-0. 771469450002954) ) ; 

new Double(-l. 008518069B3534) ) ; 

new Double{-l. 42002924459608) ) ; 

new Double(-2. 1602568782306) ) ; 

new Double(-0. 551742312137283)); 

new Double(-0. 706523147036115)); 

new Double{-l. 0219743340256) ) ; 

new Double(-0. 343403406607851)); 

new Double(-l. 42098576721322) ) ; 

new Double{-l. 16189045546058) ) ; 

new Double(-l. 50512036091287) ) ; 

new Doublel-0. 286178902087989) ) 

new Double(-0. 325781391314861) ) 

new Double(0. 0408233779793723) ) 

new Double(-0. 712259747306718) ) 

new Double(-0. 228069270366563) ) 

new Double(-l. 53023383659402)) ; 
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modparam [5] . table_H_nn .put ( "Gc" , new Double (-1 . 4234535613433 5) ) ; 
modparam[5] . table_H_nn.put < "Gg" , new Double (-1 . 16308519064659) ) ; 
modparam [5] * table_H_nn.put ( - Gf , new Double (-0 . 902484755483895) ) ; 
modparam [ 5 ] . table_H_nn .put ( ■ TA* , new Double (-0 . 246725295583488) ) ; 
roodparain [ 5 ] . table_H_jnn.put ( "TC^ , new Double (-0 . 920526169197009 )) ; 
modparam[5] . table_H_rm .put ( *TG" , new Double{-l . 279823124502) ) ; 
modparamCS) . table_H_nn.put { "TT" , new Double (-0 . 317238648246969) ) ; 
modparam[S] . table_H_nn.pufc ( "Ta- , new Double (-0 . 656770342200844 )) ; 
ntodparam[51 . table__H_nn.put(-Tc" i new Double (-0 . 832334138900636 )) ; 
niodparam[5] . table_H_nn.put( ■Tg" , new Double ( -0 . 80772313793 6193 )) ; 
ittodparam[5] . table_H_nn.put(-Tf # new Double ( -0 . 879062052445067 )) ; 
inodparam[5] . table_H_nn.put(*aA", new Double (-0 . 959947989068327 )) ; 
roodparara[5] . table_H_nn.put ( "aC B , new Double (-1 . 4028B154815212) ) ; 
--modparamLSJ ..table. = tf ii iin.put.(."aG-"-,- new-Double ( -0 .-7-91217583005563-)-)-;- 
modparam[5].table_H_nn.put(*aT-, new Double (-1. 0267B440448276) ) 
xnodparam [5] . table_H_nn. put ("cA", new Double ( -1 . 72086127613672) ) ; 
xaodparam[5] . table__H_nn.put ( -cC", new Double (-0 . 706538514572194) ) ; 
modparam[5) . table_H__nn .put ( B cG" , new Double (-1 . 27407148022265) ) ; 
modparam[5 j . table_H__nn.put ( "cT" , new Double ( -1 . 67492049706038 )) ; 
modparam ( 5) . table_H__nn.put CgA" , new Double ( -1 . 32646498674451) ) 
modparam (5] . table_H_nn .put ( n gC*, new Double ( -1 . 21084101077631) ) ; 
modparam[5] . table__H_nn .put ("gG" f new Double < -1 . 17115932925001) ) ; 
modparam (5) . tabl e_JL_nn .put ( n gT B , new Double < -1 . 51513536216474) ) ; 
modparamf 5] . table_H_nn .put ( " tA" , new Double < -0 . 1669926 5848583 5 )) ; 
modparam(5) . table_H_nn.put C tO" , new Double (-0 . 957616290525477) ) ; 
modparam(5] . table_H_nn.put ( " tG"\ new Double <-0 . 724202653405165 )) ; 
modperam[5) . table_H_nn.put ( "tT" , new Double ( -1 . 08943892482195 )) ; 

// S_nn 

modparam{5) . table_S_nn .put ( *-aa n , new Double (-21 . 0183983393758 )) ; 
modparam[5] . table_S_nn.put C-af , new Double (-20 . 9296841275317 )) ; 
xnodparam [5 ]. table_S_nn. put C-ca' , new Double (-20 . 9035893 027225) ) ; 
modparaia(5] . table_S_nn.put ( "-eg" , new Double (-2 0 . 6106325122583 )) ; 
modparam[5] . table_S_nn.put ( "-cf , new Double (-20 . 9057484730661) ) ; 
modpararo[5J . table_S_nn.put ( --ga" , new Double (-20 . 9212322254303 )) ; 
modparamC5) . table_S_nn .put ( "-gc" , new Double (-20 . 5901247977925) ) ; 
modparam ( 5 J . table_S_nn. put ( "-gg" , new Double ( -20 . 7858543446338) ) ; 
modparam[5) . table_S_nn .put < -~gt " , new Double ( -20 . 8689105244148) ) ; 
xnodparam [ 5 ) . table_S_jnn .put ( ■-ta" , new Double ( -20 . 9664930077991) ) ; 
modparam[5) . table_S_nn.put ( -DL" , new Double (-18 . 3725054920304) ) ; 
modparam[5) . table_S_nn.put ( "LD" , new Double ( -18 . 5867411854372 )) ; 
modparam(5] . table_S_rm.put < "LL^ new Double ( -33 . 7400646113203 )) ; 



modparam [ 6 ] -new TX_param ( ) ; 
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modparam [ 6] . table_deltaS_init:=new HashtableO ; 
modparamC 6J . table_deltaH_init=new HashtableO ; 
modparam [6] . table_H_jnn=new HashtableO; 
modparam [ 6} . table_S_nn=new HashtableO ; 
modparamC 6) . table_JI_mono=new HashtableO ; 


// deltaS_init 

modparam(6) . table_deltaS_init .put ( ■ A* , 
10 1.0899853854154) ) ; 

modparamC 6] . table_deltaS_init .put ("C, 
1.26650514222434) ) ; 

modparamC 6] . table_deltaS_init .put ( . 
0.636096340366464)); 

15 modparamC 6] . Cable_deltaS_init .put ( "T" , 

0.692536920626161) ) ; 

modparam C6] . table_deltaS_init .put ("s" , 
0.372560237115056)); 

modparamC 6] . table_deltaS_init .put ("w", 
20 0 . 3692118427 48097 > > ; 


Double (- 
Double ( - 
Double ( - 
Double (- 
Double (- 
Double (- 


25 


30 


35 


40 


// deltaH_init 

modparamC 6] - table__deltaH_init .put ( "A" , new 
Double<0. 346600533782424) ) ; 

modparam[6] . table_deltaH_init .put ( 9 C m , new Doubled 

0. 571391474074441) ) ; 

modparam [ 6] . table_deltaH_init .put CG", new Double (- 

0 . 431395953242384) ) j 

modparam [ 6) . table__deltaH_init. put ( B T W r new Double (- 

0.175680165547273) ) ; 

modparamC6) . table_deltaH_init .put ( "s ■ , new 
Double (0.674274914671847) ) 

modparamC 6) . table_del taH_init .put ("w", new 
Double (0.737204216419899) ) ; 

// tOnono 

modparam[6] . table_H_mono. put (*A" , new Double { -13 . 07 54325103332 )) ; 

modparam (6 3. table_H_mono.put ( *C - , new Double (-13 . 646666116136) ) ; 

modparaml6) . table_H_mono. put ( •G" , new Doublet -13 . 6293972843139 )) ; 

modparamC 6] . table_JOnono . put ( m T m , new Double ( -12 . 9859631483842 )) ; 


45 


// H_nn 
modparamC 6] 
modparam ( 6 ] 
modparam [ 6 ] 
modparam [6] 
modparam 16] 
modparam [ 6 ] 
modparam { 6 ] 


table_H_jcm.put ( "-aa" , 
table_H_nn . put I • - at ■ , 
table_H_nn . put ( ■ -ca ■ , 
table_H_nn . put ( " -eg " , 
table_H_nn .put { ■ -ct " , 
cable_H_nn . put ( ■ -ga " , 
table_H_nn . pu t ( ■ -gc ■ , 


new Double <- 
new Double <- 
new Double (- 
new Double <- 
new Double {■ 
new Double <- 
new Double <- 


7.48976721304652) ) ; 
7.32484387195421) ) ; 
7.87301403612122)) ; 
8.42170498069151) ) ; 
7.74869970168917) ) ; 
7.76121630019634) ) ; 
8. 50240032966233) ) j 
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modparam[6] . table_H_nn.put C-gg-, new Double(-8.2S010916492258) ) ; 
inodparam[6) .table_JI_nn.put("-gt'\ new Double(-7. 93137471336983) ) ; 
modparam [6] . tableJfOui.putC-ta*, new Double (-7 . 19824578597169 )) ; 
modpar am[ 6] . table_H_nn.put ("AA- , new Double (-0 . 550329001961804 )) ; 
modparain[6] . table_H_nn . put ( " AC new Double ( -0 . 547445535909528 )> ; 
modparam(6] . table.JI_nn.put ( " AG" , new Double (-0 . 921006379530219 )> ; 
modpaxam(6] .table_H_nn.put("AT-, new Double (-0 . 344957768635853 >> ; 
modparam[6] . table_H_nn.put ("Aa", new Double(-0 . 754556907253045} ) ; 
modparam[6] . table_H_nn .put ("AC, new Double <-l . 24531973714279 )> ; 
roodparam[6) . table_H_nn.put fAg". new Double<-l . 14112776038759 )) ; 
modparam[6] . table_n_nn.put ['At' , new Double (-2 . 40661512922826 }) ; 
modparam[6] ,table_H_nn.put CCA', new Double { -0 . 5998433 105999 13 )) ; 
modparam[6] . table_JJ_nn.put ("CC", new Double { -0 . 979896995845449 )) ; 
modparam(6] . table__H_nn.put ( m CG m . new Double (-1 40702090497018 ) ) ; 
modparam[6] . table__H_nn.put ( -CT - , new Double (-0 . 5443 68218865807 )) ; 
modparam[6] .table_H_nn.put CCa", new Double <-0 . 84623437898934) ) , 
modparam [ 6 ] . table_H__nn.put ( "Cc - » new Double ( -1 . 22861371811788) ) ; 
modpar am[ 6) ,table_IL_nn.put ("Cg - , new Double (-1. 58027912989158) ) ; 
modpar am [ 6 ] . tablej_nn .put < "Of , new Double (-0. 47515181198 6212 )) ; 
modparam E6) . table_H__nn . put CGA". new Double {-0 . 162584226406409 )) ; 
modparam[6] , table_H__nn .put ( "GC" , new Double (-0 . 587156187857858 )) ; 
modparam [ 6 ) . table_H_nn . put ( "GG • , new Double (-1. 41663092804754) ) ; 
modpar amf 6) . table_H_nn. put < "GT" , new Double (-0 . 583688894933071 )) ; 
modparam[6J . table__H_nn .put ( 'Ga" , new Double ( -1 . 6538514342035 )) ; 
modpar am[ 6) .table_H_nn. put ( *Gc" , new Double (-1 . 12801570402914) ) ; 
modparam [6] . table_H_nn .put ("Gg*\ new Double {-0 . 659846271417488 )) ? 
modparam[6) . table_H_nn .put < B Gt " , new Double (-0 . 881015310109863 )) ; 
modparam[6] . table_H_nn .put < -TA" , new Double ( -0 . 346920026493557 }) ; 
raodparam[6] . table_H_nn.put ( *TC", new Double ( -0 . 918176991777502 )) ; 
modparam[6] . table_JI_nn .put ("TG", new Double<~l . 44038679704405) ) 
modparam[6] . table_H_nn .put ( -TT" , new Double (-0 . 63S44324592585) ) 
modparam[ 6] ,table_H_nn. put ("Ta- # new Double ( -0 . 509070031861056 )) ; 
modparamf6] .table_H_nn.put(-Tc", new Double(-l. 03584670655476) ) ; 
modparam [ 6] . table_H_nn.put ( "Tg* , new Double<-l. 40946877218105) ) ; 
modparani£6J . table_H_nn.put ( "Tt" , new Double { -0 . 870845046257428 )) ; 
modparam[6J . table_H_nn.put ("aA", new Double (-1. 20602481577836 )) ? 
modparam[6] . table_H_nn.put CaC", new Double(-l . 23960733216066 >) ; 
modparam[6] . table_H_nn.put { "aG" , new Double {- 0 . 633900561424835) ) r 
modparamf6) . table_H_nn .put < 'aT* , new Double(-0 . 94390709787839 >) ; 
modparam[6I . table_H__nn.put ( B cA' , new Double (-1 . 46284985286192) > ; 
modparam [6 ] .table_H_nn. put < 'cC , new Double (-0 . 816866823421651) ) ; 
modparam[6] . table_H__nn.put CcG" , new Doublet-l. 31737354533552 )) ; 
modparam [6] .tab le_H_nn.put ( "cT" , new Double (-1 . 7 0861548243372 )) ; 
modpar am[6] .table_H_nn. put CgA" , new Double (-1 . 0656492536418) ) ; 
modparam[6] . table_H_nn .put ( -gC" , new Double (-1 . 3976489761836 )) ; 
modparam[6J . table_H_nn.put CgG" , new Double(-0. 645745276016066 )) ,■ 
modparamC6] .table_H_nn.put CgT" , new Double ( -1. 70832316347213 )) ; 
modparam[6] . table_H_nn.put CtA' , new Double (-1. 2047 4315180883 )) ; 
modparamt$] . table_H_nn .put CtC", new Doublet -1. 1530087974343 )) ; 
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modparam[6) . table_H__nn.put ( " tG" , new Double{-0 . 366342339192337) ) ; 
modparamf 6] .table^H^nn.putCtT-, new Double<-0 . 709379925645181) J ; 

// S_nn 

5 modparam[63 . table_S_jrm.put ( "-aa" , new Double (-21 . 0183983393758) > ; 

modparam[6] . table_S_nn.put ( ' -at ■ , new Double(-20. 9296841275317) ) ; 

modparam(63 .table_S_nn.put("-ca", new Double<-20. 9035893027225) } ; 

modparam [63 . table_S_nn.put ( "-eg" , new Double (-20. 6106325122583) ) ; 

modparam [ 6 ] . table_S_nn .put ( *-ct * , new Double (-20. 9057484730661) ) ; 
10 modparam[6] . table_S._jon.put (*-ga" , new Double (-20. 9212322254303 )) ; 

modparam! 6] . table_S_nn .put C-gc" , new Double (-20. 5901247977925) ) ; 

modparam[6] . table_S_nn.put C-gg" , new Double (-20 . 785B543446338 )) ; 

modparam[6] . table_S_nn.put (*-gf , new Double (-20 . 8689105244148 )) ; 

modparam(6] . table_S_nn.put ( "-ta" , new Double (-20 . 9664930077991) ) ; 
15 modparam 1 6 J . table_S_nn . put ( ■ DL" f new Double< -19 . 0042167633551) ) ; 

modparam[6J . table_S_nn .put < 'L.D- , new Double ( -19 .3040440770709 )) ; 

modparam[6] .table_S_nn.putCLL", new Double < -37 . 0502230904161) ) ; 


20 ) 


35 


// End of model parameters 


public double calcHm (String inputsequence) throws BadOligoException 


boolean 

25 debug^Boolean.valueOf (parameter Properties . getProperty ( - debug - , -null* ) ) . 
boo lean Value ( ) ; 
int i; 

double [] result s=new double [7 3 j 
double retval; 
30 double sum=0.0; 

int num=0; 


for(i=0; i<7; i+O ( 

results [i]=docalcTm( modparam [i3 / inputsequence) ; 

if (debug) ( 

Sys tern. out. print In ( -Results [■ +! + "]--+ results [i]); 


) 

40 sum+=results (i) ; nua++; 

) 

Arrays . sort ( resul ts ) ? 

45 // retval =6 um/ num; // Mean of results 

retval=Math.round(results[3]+0.05) ; // Median of results (add 0.0S 
to get same result as www.lna-tm.com) 


if (debug) { 
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System.out.printlnf "Retval =" + retval + ■ (- +results [3 ) + •)■) ; 

) 

return retval; // Median of results 

5 ) 

private double docalcTm(Tm_param modparam, String input sequence) 
throws BadOligoException ( 
int i ? 

10 String mon, dirtier; 

Hashtable pairs=new HashtableO; 
Hashtable endmonos=new Kashtable ( ) ; 
Hashtable monos=new Hashtable ( ) ; 


15 double 

oligo_conc=Doiible. valueOf ( parameter Proper ties .get Proper ty ( "oligo_conc" , 
■null- ) ) .doubleValueO ; 
double 

salt_conc-Double.valueOf ( parameter Proper ties .get Proper ty ( "sal t_Conc H , "n 
20 ull D ) > .doubleValueO ; 
boolean 

debug=Boo lean. value Of { parameter Proper ties, get Proper ty [ "debug* , "null" ) ) . 
booleanValue { ) ; 

25 seouence=findReplace( input sequence, •xnC", m C m )t 

sequence=f indReplace (sequence, "X*, "C" ) ; 
if (debug) { 

System, out .printlnt "Translated seq: i +sequence) ; 

) 

30 

checlcSequence { sequence) ; 

// Count endpairs 
String f irst_mon= sequence . substring (0 , 1) ; 
35 String last_mon=sequence , substring ( sequence . length ( ) - 

1 , sequence - length ( ) ) ; 

if (debug) { 

40 System, out. println( ■first_mon="+f irst_mon+" ; * + (String) trans_ws ( firs t_mo 
n) +• ; "+ (String) trans_LD(f irstjnon) ) ; 

System, out . println ( ■ last_ J mon= " +last_jnon+* ; ■ + ( String ) trans_ws (las t_mon) + 
" ; " + (String) trans_LD(last_mon) ) ; 

45 > 

hcount (endmonos , (String) f irst_mon) ; 

hcount (endmonos, (String) trans.ws ( fir st_mon) ) ; 

hcount ( endmonos , (String) trans_LD( first _mon) ) ; 
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hcount { endmonos , (String) last_jmon) ; 

hcount (endmonos, (String) trans_ws (last_jnon) ) ; 

hcount (endmonos, (String) trans_LD( las t_mon) J ; 

5 

// Count monomers 

for(i=0; ioeguence. length. ( ) ; i++) { 
mon= sequence, subs t ring I i , i+1) ; 

10 if (debug) ( 

System, out . print In ( "111011= ■+mon+" ; ■+ (String) trans_ws (mon) +" ; •+ (Strin 
g) trans_LD{mon) ) ; 

15 hcount (monos, ( String ) mon) ; 

hcount (monos* ( String) trans_ws (mon) ) ; 
hcount {monos, (String) trans_LD{ mon) ) ? 

) 

20 // Count nearest neightbors 

£or(i=0; i<sequence . length () -1; i++) { 
d inter = sequence . substring ( i , i+2 ) ; 


25 


40 


if (debug) ( 


System. out .printlnC "dimer=*+dimer+ - ; '+ (String) trans_ws (dimer) +■ ; ■ + 
(String) t ran s_LD(dimer ) +• ; " + { String ) table_ten_trans late . get (dimer) ) ; 
) 

hcount (pairs, (String) dimer) ; 
30 hcount (pairs , (String) trans_LD (dimer )) ; 

hcount (pairs , ( String ) trans_ws (dimer) ) ; 

hcount (pairs, (String) table_ten_t ransl ate. get (dimer) ) ; 

) 

3 5 //Do the calculation 

Enumeration hkeyfi; 
String key; 


double deltaH_sum=0, deltaS_sum=0 ; 


// delta_H 

hkeys=modparam. table_deltaH_ini t . keys ( ) ; 
while (hkeys.hasMoreElements ( ) ) ( 

keys= (String) hkeys . next Element ( ) ~, 
45 if (debug) ( 

Sy s tern. out. print In ( "deltaH_init : ■+ key + param=" 
hget (modparam. table_deltaH_ini t , key } + ■ ; val="+ hget (endmonos , key) ) ; 
) 
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deltaH_BUm+=hget (modparam. table_deltaH_init , key) *hget < enc3monos , key) ; 
) 

hkeys =modpar am. table_H_mono . keys ( ) ; 
5 while (hkeys . hasMor eElement s ( ) ) ( 

key= ( String) hkeys . nextElement ( ) ; 
i£ (debug) { 

System, out. print In ( "H__mono : key-f ■; parara=' + 

hget (modparam . table_H_mono , key) + ■ ; val= ■ + hget (monos , key) ) ; 

10 > 

deltaH_sum+=hget (modparam. table_H_mono,key) * (hget (monos , key) - 
0 . 5*hget i endmonos , key) ) ; 
) 

hkeys =roodpar am. -table—H^jin . keys (-)--; 

15 while (hkeys.hasMoreElements () ) ( 

key= ( String) hkeys. next Element ( ) ; 
if (debug) { 

Sys tern. out. print In ( "H_nn: " + key+ param=" + 

hget (modparam. table_H_ nn » key) •* ■ ; val="+ hget (pairs, key) ) ; 
20 ) 

del taH_sum+=hget (modparam. tablOOm, key) *hget(pairs,key) ; 

) 

// delta_S 

25 hkeys =*modpar am. table_deltaS_init .keys ( ) ; 

while (hkeys . hasMore Elements ( ) ) { 

key= ( S tr ing ) hkeys . next El emen t ( ) ; 
if (debug) { 

System, out. print In ( "del taS_init : ■+ key+ "; param=* + 
30 hget (modparam. table_deltaS__init , key) + ■ ; val = B + hget (endmonos , key) ) ; 
) 

deltaS__suiiw=hget (modparam. table_deltaS_init , key) *hget (endmonos , key) ; 
> 

3 5 hkeys =modpar am. table_S_nn . keys ( ) ; 

while (hkeys .has Ho r eElement s ( ) ) ( 

key= ( String) hkeys . nextElement ( ) ; 
if (debug) { 

System, out .print In ( 'Sjnn: ■+ key+ params* + 

40 hget (modparam. table_S_nn, key) + ■; val="+ hget (pairs , key) ) ; 
) 

deltaS_sum+=hget (modparam . tab le_S_nn, key ) *hget (pairs, key) ; 

) 

45 // Calculate Tm 

tm=new Doublet (double) ( -273 . 15 + 1000 

* (deltaH_sum/ (deltaS_sum+0 . 36B* ( sequence . length ( ) - 
1) *Math. log(salt_conc)+1.987*Math. log (oligo_conc/4 ) ) I ) ) ; 


r e turn tm . doubleVa lue ( ) ; 
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Exon 1 


Original Sequence: 


Novel Sequence CSplicefonn): 


Intron 


Exon 2 


Exon 1 


Exon 2 
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Constant Region 


.Variable region 


Spliceforms: 


Spliceform 1: C 


Spliceform 2: c 


or 


Spliceform 1: C 


Spliceform 2: 


Fiqort H\ 


-Merged Probes (MP) or Junction Probes 
-Unique Internal Probes (UIP) 
-Shared Internal Probes (SIP) 
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Spliceform 2: C 


Figure, ^3 
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□ Experimental values 
■ Known fold-change 


DNA 


LNAT 


LNA3 


Figure 46 



BDNA 
■ LNAT 
□ LNATC 


40°C 


40°C +50% 
form amide 


55°C 


65°C 


Figure 47 
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Recombinant splice variant #1 
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I i I 
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Recombinant splice variant U2 
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LNA 50mer 


0.0 0.2 0.4 0.6 0.8 1.0 


Cy3: spike 01-INS3-03 
Cy3: spike 01-03 
Cy5: spike 01-INS3-03 
Cy5: spike 01-03 

1000 ppm 
Cy3: spike 01-1NS3-03 
Cy3: spike 01-INS4-03 
Cy5: spike 01-INS3-03 
CyS: spike 01-INS4-03 

1000 ppm 
Cy3: spike 01-INS3-03 
Cy3: spike 01-INS4-03 
Cy5: spike 01-INS3-03 
CyS: spike 01-INS4-03 
100 ppm 
Cy3: spike 01-INS3-03 
Cy3: spike 01-INS4-03 
Cy5: spike 01-INS3-03 
CyS: spike 01-INS4-03 
10 ppm 

Cy3: spike 01-IN53-03 
Cy3: spike 01-INS4-03 
CyS: spike 01-INS3-03 
Cy5: spike 01-1NS4-03 




□ Observed ratios 
M Expected ratios 


Figure 49 
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□ Observed ratios 
■ Expected ratios 


Figure 50 



